Developing a Telegram bot for text recognition in images, audio synthesis and recognition
In this tutorial, you will create a Telegram bot that can:
- Synthesize speech from a message text and recognize speech in voice messages using the Yandex SpeechKit Python SDK.
- Recognize text in images using Yandex Vision OCR.
Authentication in Yandex Cloud services is performed under a service account using an IAM token. The IAM token is contained in the handler context of the function which manages user conversation with the bot.
The Yandex API Gateway will receive requests from your bot and forward them to Yandex Cloud Functions for processing.
To create a bot:
- Get your cloud ready.
- Set up resources.
- Register your Telegram bot.
- Create a function.
- Create an API gateway.
- Link the function and the bot.
- Test the bot.
If you no longer need the resources you created, delete them.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can navigate to the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of Telegram bot support includes:
- Fee for using SpeechKit (see SpeechKit pricing).
- Fee for using Vision OCR (see Vision OCR pricing).
- Fee for function invocation count, computing resources allocated to run the function, and outbound traffic (see Cloud Functions pricing).
- Fee for the number of requests to the API gateway and outbound traffic (see API Gateway pricing).
Set up resources
-
Create a service account named
recognizer-bot-sa
and assign it theai.editor
andfunctions.editor
roles for your folder. -
Download
the archive with the FFmpeg package for the SpeechKit Python SDK to work correctly in the function execution environment. -
Create a ZIP archive with the function code:
-
Create a file named
index.py
and paste the code below to it.index.py
import logging import requests import telebot import json import os import base64 from speechkit import model_repository, configure_credentials, creds from speechkit.stt import AudioProcessingType # Service endpoints and authentication credentials API_TOKEN = os.environ['TELEGRAM_TOKEN'] vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText' folder_id = "" iam_token = '' path = os.environ.get("PATH") os.environ["PATH"] = path + ':/function/code' logger = telebot.logger telebot.logger.setLevel(logging.INFO) bot = telebot.TeleBot(API_TOKEN, threaded=False) # Getting the folder ID def get_folder_id(iam_token, version_id): headers = {'Authorization': f'Bearer {iam_token}'} function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}', headers=headers) function_id_data = function_id_req.json() function_id = function_id_data['functionId'] folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}', headers=headers) folder_id_data = folder_id_req.json() folder_id = folder_id_data['folderId'] return folder_id def process_event(event): request_body_dict = json.loads(event['body']) update = telebot.types.Update.de_json(request_body_dict) bot.process_new_updates([update]) def handler(event, context): global iam_token, folder_id iam_token = context.token["access_token"] version_id = context.function_version folder_id = get_folder_id(iam_token, version_id) # Authenticating in SpeechKit with an IAM token configure_credentials( yandex_credentials=creds.YandexCredentials( iam_token=iam_token ) ) process_event(event) return { 'statusCode': 200 } # Command and message handlers @bot.message_handler(commands=['help', 'start']) def send_welcome(message): bot.reply_to(message, "The bot can do the following:\n*Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.") @bot.message_handler(func=lambda message: True, content_types=['text']) def echo_message(message): export_path = '/tmp/audio.ogg' synthesize(message.text, export_path) voice = open(export_path, 'rb') bot.send_voice(message.chat.id, voice) @bot.message_handler(func=lambda message: True, content_types=['voice']) def echo_audio(message): file_id = message.voice.file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) response_text = audio_analyze(downloaded_file) bot.reply_to(message, response_text) @bot.message_handler(func=lambda message: True, content_types=['photo']) def echo_photo(message): file_id = message.photo[-1].file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) image_data = base64.b64encode(downloaded_file).decode('utf-8') response_text = image_analyze(vision_url, iam_token, folder_id, image_data) bot.reply_to(message, response_text) # Image recognition def image_analyze(vision_url, iam_token, folder_id, image_data): response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={ "mimeType": "image", "languageCodes": ["en", "ru"], "model": "page", "content": image_data }) blocks = response.json()['result']['textAnnotation']['blocks'] text = '' for block in blocks: for line in block['lines']: for word in line['words']: text += word['text'] + ' ' text += '\n' return text # Speech recognition def audio_analyze(audio_data): model = model_repository.recognition_model() # Recognition settings model.model = 'general' model.language = 'ru-RU' model.audio_processing_type = AudioProcessingType.Full try: result = model.transcribe(audio_data) speech_text = [res.normalized_text for res in result] return ' '.join(speech_text) except: return 'Cannot recognize message' # Speech synthesis def synthesize(folder_id, iam_token, text): model = model_repository.synthesis_model() # Synthesis settings model.voice = 'kirill' result = model.synthesize(text, raw_format=False) result.export(export_path, 'ogg')
-
Create a file named
requirements.txt
. In this file, specify a library to use for the bot and the Python SDK library.telebot yandex-speechkit
-
Add the
index.py
andrequirements.txt
files and theffmpeg
andffprobe
binary files from the FFMpeg utility into theindex.zip
ZIP archive. -
Create an Object Storage bucket and upload the created ZIP archive into it.
-
Register your Telegram bot
Register your bot in Telegram and get a token.
-
Start BotFather
and send it the following command:/newbot
-
In the
name
field, enter a name for the new bot. This is the name users will see when chatting with the bot. -
In the
username
field, enter a username for the new bot. You can use it to find the bot in Telegram. The username must end with...Bot
or..._bot
.Once done, you will get a token. Save it, as you will need it later.
Create a function
Create a function to process user actions in the chat.
-
In the management console
, select the folder where you want to create a function. -
In the list of services, select Cloud Functions.
-
Create a function:
- Click Create function.
- Enter the function name:
for-recognizer-bot
. - Click Create.
-
Create a function version:
-
Select
Python
as the runtime environment, disable Add files with code examples, and click Continue. -
Specify the upload method
Object Storage
and select the bucket you created earlier. In the Object field, specify the file name:index.zip
. -
Specify the entry point:
index.handler
. -
Under Parameters, specify:
-
Timeout:
30
. -
Memory:
128 MB
. -
Service account:
recognizer-bot-sa
. -
Environment variables:
TELEGRAM_TOKEN
: Your Telegram bot token.
-
-
Click Save changes.
-
-
Create a function named
for-recognizer-bot
:yc serverless function create --name=for-recognizer-bot
Result:
id: b09bhaokchn9******** folder_id: aoek49ghmknn******** created_at: "2023-03-21T10:03:37.475Z" name: for-recognizer-bot log_group_id: eolm8aoq9vcp******** http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9******** status: ACTIVE
-
Create a version of the
for-recognizer-bot
function:yc serverless function version create \ --function-name for-recognizer-bot \ --memory=128m \ --execution-timeout=30s \ --runtime=python312 \ --entrypoint=index.handler \ --service-account-id=<service_account_ID> \ --environment TELEGRAM_TOKEN=<bot_token> \ --package-bucket-name=<bucket_name> \ --package-object-name=index.zip
Where:
--function-name
: Name of the function whose version you are creating.--memory
: Amount of RAM.--execution-timeout
: Maximum function running time before timeout.--runtime
: Runtime environment.--entrypoint
: Entry point.--service-account-id
:recognizer-bot-sa
service account ID.--environment
: Environment variables.--package-bucket-name
: Bucket name.--package-object-name
: File key in theindex.zip
bucket.
Result:
done (1s) id: d4e6qqlh53nu******** function_id: d4emc80mnp5n******** created_at: "2025-03-22T16:49:41.800Z" runtime: python312 entrypoint: index.handler resources: memory: "134217728" execution_timeout: 30s service_account_id: aje20nhregkc******** image_size: "4096" status: ACTIVE tags: - $latest log_group_id: ckgmc3l93cl0******** environment: TELEGRAM_TOKEN: <bot_token> log_options: folder_id: b1g86q4m5vej********
-
In the configuration file, describe the function settings:
resource "yandex_function" "for-recognizer-bot-function" { name = "for-recognizer-bot" user_hash = "first function" runtime = "python312" entrypoint = "index.handler" memory = "128" execution_timeout = "30" service_account_id = "aje20nhregkcvu******" environment = { TELEGRAM_TOKEN = <bot_token> } package { bucket_name = <bucket_name> object_name = "index.zip" } }
Where:
name
: Function name.user_hash
: Custom string to define the function version.runtime
: Function runtime environment.entrypoint
: Entry point.memory
: Amount of memory allocated for the function, in MB.execution_timeout
: Function execution timeout.service_account_id
:recognizer-bot-sa
service account ID.environment
: Environment variables.package
: Name of the bucket containing the uploadedindex.zip
archive with the function source code.
For more information about
yandex_function
properties, see this Terraform article . -
Make sure the configuration files are correct.
-
In the command line, navigate to the directory where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy the cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply
-
Confirm creating the function by typing
yes
in the terminal and pressing Enter.
-
To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.
To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.
Create an API gateway
The Telegram server will notify your bot of new messages using a webhookfor-recognizer-bot
function for processing.
-
In the management console
, select the folder where you want to create an API gateway. -
In the list of services, select API Gateway.
-
Click Create API gateway.
-
In the Name field, enter
recognizer-bot-api-gw
. -
Under Specification, add the following specification:
openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function
Where:
function_id
:for-recognizer-bot
function ID.service_account_id
:recognizer-bot-sa
service account ID.
-
Click Create.
-
Select the created API gateway. Save the Default domain field value. You will need it later.
-
Save the following specification to
spec.yaml
:openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function
Where:
function_id
:for-recognizer-bot
function ID.service_account_id
:recognizer-bot-sa
service account ID.
-
Run this command:
yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml
Where:
--name
: API gateway name.--spec
: Specification file.
Result:
done (5s) id: d5d1ud9bli1e******** folder_id: b1gc1t4cb638******** created_at: "2023-09-25T16:01:48.926Z" name: recognizer-bot-api-gw status: ACTIVE domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net log_group_id: ckgefpleo5eg******** connectivity: {} log_options: folder_id: b1gc1t4cb638********
To create an API gateway:
-
Describe the
yandex_api_gateway
properties in the configuration file:resource "yandex_api_gateway" "recognizer-bot-api-gw" { name = "recognizer-bot-api-gw" spec = <<-EOT openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function EOT }
Where:
name
: API gateway name.spec
: API gateway specification.
For more information about resource properties, see this Terraform article
. -
Make sure the configuration files are correct.
-
In the command line, navigate to the directory where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy the cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply
-
Confirm creating the resources: type
yes
in the terminal and press Enter.
-
To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.
Configure a link between the function and the Telegram bot
Install a webhook for your Telegram bot:
curl --request POST \
--url https://api.telegram.org/bot<bot_token>/setWebhook \
--header 'content-type: application/json' \
--data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'
Where:
<bot_token>
: Telegram bot token.<API_gateway_domain>
:recognizer-bot-api-gw
API gateway's service domain.
Result:
{"ok":true,"result":true,"description":"Webhook was set"}
Test the bot
Chat with the bot:
-
Open Telegram and search for the bot by the specified
username
. -
Send
/start
to the chat.The bot should respond with:
The bot can do the following: * Recognize text from images. * Generate voice messages from text. * Convert voice messages to text.
-
Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
-
Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
-
Send an image with text to the chat. The bot will respond with a message containing the recognized text.
Note
The image must meet these requirements.
How to delete the resources you created
To stop paying for the resources you created: