Developing a Telegram bot for text recognition in images, audio synthesis and recognition
In this tutorial, you will create a Telegram bot that can:
- Synthesize speech from a message text and recognize speech in voice messages using the Yandex SpeechKit Python SDK.
- Recognize text in images using Yandex Vision OCR.
Authentication in Yandex Cloud services is performed under a service account using an IAM token. The IAM token is contained in the handler context of the function which manages user conversation with the bot.
The Yandex API Gateway will receive requests from your bot and forward them to Yandex Cloud Functions for processing.
To create a bot:
- Get your cloud ready.
- Set up resources.
- Register your Telegram bot.
- Create a function.
- Create an API gateway.
- Link the function and the bot.
- Test the bot.
If you no longer need the resources you created, delete them.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can navigate to the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of Telegram bot support includes:
- Fee for using SpeechKit (see SpeechKit pricing).
- Fee for using Vision OCR (see Vision OCR pricing).
- Fee for function invocation count, computing resources allocated to run the function, and outbound traffic (see Cloud Functions pricing).
- Fee for the number of requests to the API gateway and outbound traffic (see API Gateway pricing).
Set up resources
-
Create a service account named
recognizer-bot-saand assign it theai.editorandfunctions.editorroles for your folder. -
Download
the archive with the FFmpeg package for the SpeechKit Python SDK to work correctly in the function execution environment. -
Extract the
ffmpegandffprobebinary files from the archive and run these commands to make them executable:chmod +x ffmpeg chmod +x ffprobe -
Create a ZIP archive with the function code:
-
Create a file named
index.pyand paste the code below to it.index.py
import logging import requests import telebot import json import os import base64 from speechkit import model_repository, configure_credentials, creds from speechkit.stt import AudioProcessingType folder_id = "" iam_token = '' # Image recognition service endpoint and authentication data API_TOKEN = os.environ['TELEGRAM_TOKEN'] vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText' # Adding the folder with ffmpeg to the system PATH path = os.environ.get("PATH") os.environ["PATH"] = path + ':/function/code' logger = telebot.logger telebot.logger.setLevel(logging.INFO) bot = telebot.TeleBot(API_TOKEN, threaded=False) # Getting the folder ID def get_folder_id(iam_token, version_id): headers = {'Authorization': f'Bearer {iam_token}'} function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}', headers=headers) function_id_data = function_id_req.json() function_id = function_id_data['functionId'] folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}', headers=headers) folder_id_data = folder_id_req.json() folder_id = folder_id_data['folderId'] return folder_id def process_event(event): request_body_dict = json.loads(event['body']) update = telebot.types.Update.de_json(request_body_dict) bot.process_new_updates([update]) def handler(event, context): global iam_token, folder_id iam_token = context.token["access_token"] version_id = context.function_version folder_id = get_folder_id(iam_token, version_id) # Authenticating in SpeechKit with an IAM token configure_credentials( yandex_credentials=creds.YandexCredentials( iam_token=iam_token ) ) process_event(event) return { 'statusCode': 200 } # Command and message handlers @bot.message_handler(commands=['help', 'start']) def send_welcome(message): bot.reply_to(message, "The bot can do the following:\n* Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.") @bot.message_handler(func=lambda message: True, content_types=['text']) def echo_message(message): export_path = '/tmp/audio.ogg' synthesize(message.text, export_path) with open(export_path, 'rb') as voice: bot.send_voice(message.chat.id, voice) @bot.message_handler(func=lambda message: True, content_types=['voice']) def echo_audio(message): file_id = message.voice.file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) response_text = audio_analyze(downloaded_file) bot.reply_to(message, response_text) @bot.message_handler(func=lambda message: True, content_types=['photo']) def echo_photo(message): file_id = message.photo[-1].file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) image_data = base64.b64encode(downloaded_file).decode('utf-8') response_text = image_analyze(vision_url, iam_token, folder_id, image_data) bot.reply_to(message, response_text) # Image recognition def image_analyze(vision_url, iam_token, folder_id, image_data): response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={ "mimeType": "image", "languageCodes": ["en", "ru"], "model": "page", "content": image_data }) blocks = response.json()['result']['textAnnotation']['blocks'] text = '' for block in blocks: for line in block['lines']: for word in line['words']: text += word['text'] + ' ' text += '\n' return text # Speech recognition def audio_analyze(audio_data): model = model_repository.recognition_model() # Recognition settings model.model = 'general' model.language = 'ru-RU' model.audio_processing_type = AudioProcessingType.Full result = model.transcribe(audio_data) speech_text = [res.normalized_text for res in result] return ' '.join(speech_text) # Speech synthesis def synthesize(text, export_path): model = model_repository.synthesis_model() # Synthesis settings model.voice = 'kirill' result = model.synthesize(text, raw_format=False) result.export(export_path, 'ogg') -
Create a file named
requirements.txt. In this file, specify a library to use for the bot and the Python SDK library.pyTelegramBotAPI==4.27 yandex-speechkit==1.5.0 -
Add the
index.py,requirements.txt,ffmpeg, andffprobefiles into the ZIP archive.
-
-
Create an Object Storage bucket and upload the created ZIP archive into it.
Register your Telegram bot
Register your bot in Telegram and get a token.
-
Start BotFather
and send it the following command:/newbot -
In the
namefield, enter a name for the new bot. This is the name users will see when chatting with the bot. -
In the
usernamefield, enter a username for the new bot. You can use it to find the bot in Telegram. The username must end with...Botor..._bot.Once done, you will get a token. Save it, as you will need it later.
Create a function
Create a function to process user actions in the chat.
-
In the management console
, select the folder where you want to create a function. -
In the list of services, select Cloud Functions.
-
Create a function:
- Click Create function.
- Enter the function name:
for-recognizer-bot. - Click Create.
-
Create a function version:
-
Select
Pythonas the runtime environment, disable Add files with code examples, and click Continue. -
Specify the upload method
Object Storageand select the bucket you created earlier. In the Object field, specify the file name:index.zip. -
Specify the entry point:
index.handler. -
Under Parameters, specify:
-
Timeout:
30. -
Memory:
256 MB. -
Service account:
recognizer-bot-sa. -
Environment variables:
TELEGRAM_TOKEN: Your Telegram bot token.
-
-
Click Save changes.
-
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
-
Create a function named
for-recognizer-bot:yc serverless function create --name=for-recognizer-botResult:
id: b09bhaokchn9******** folder_id: aoek49ghmknn******** created_at: "2023-03-21T10:03:37.475Z" name: for-recognizer-bot log_group_id: eolm8aoq9vcp******** http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9******** status: ACTIVE -
Create a version of the
for-recognizer-botfunction:yc serverless function version create \ --function-name for-recognizer-bot \ --memory=256m \ --execution-timeout=30s \ --runtime=python312 \ --entrypoint=index.handler \ --service-account-id=<service_account_ID> \ --environment TELEGRAM_TOKEN=<bot_token> \ --package-bucket-name=<bucket_name> \ --package-object-name=index.zipWhere:
--function-name: Name of the function whose version you are creating.--memory: Amount of RAM.--execution-timeout: Maximum function running time before timeout.--runtime: Runtime environment.--entrypoint: Entry point.--service-account-id:recognizer-bot-saservice account ID.--environment: Environment variables.--package-bucket-name: Bucket name.--package-object-name: File key in theindex.zipbucket.
Result:
done (1s) id: d4e6qqlh53nu******** function_id: d4emc80mnp5n******** created_at: "2025-03-22T16:49:41.800Z" runtime: python312 entrypoint: index.handler resources: memory: "268435456" execution_timeout: 30s service_account_id: aje20nhregkc******** image_size: "4096" status: ACTIVE tags: - $latest log_group_id: ckgmc3l93cl0******** environment: TELEGRAM_TOKEN: <bot_token> log_options: folder_id: b1g86q4m5vej********
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
If you do not have Terraform yet, install it and configure the Yandex Cloud provider.
-
In the configuration file, describe the function settings:
resource "yandex_function" "for-recognizer-bot-function" { name = "for-recognizer-bot" user_hash = "first function" runtime = "python312" entrypoint = "index.handler" memory = "256" execution_timeout = "30" service_account_id = "aje20nhregkcvu******" environment = { TELEGRAM_TOKEN = <bot_token> } package { bucket_name = <bucket_name> object_name = "index.zip" } }Where:
name: Function name.user_hash: Custom string to define the function version.runtime: Function runtime environment.entrypoint: Entry point.memory: Amount of memory allocated for the function, in MB.execution_timeout: Function execution timeout.service_account_id:recognizer-bot-saservice account ID.environment: Environment variables.package: Name of the bucket containing the uploadedindex.ziparchive with the function source code.
For more information about
yandex_functionproperties, see the relevant provider documentation. -
Make sure the configuration files are correct.
-
In the command line, navigate to the directory where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy the cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply -
Confirm creating the function by typing
yesin the terminal and pressing Enter.
-
To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.
To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.
Create an API gateway
The Telegram server will notify your bot of new messages using a webhookfor-recognizer-bot function for processing.
-
In the management console
, select the folder where you want to create an API gateway. -
In the list of services, select API Gateway.
-
Click Create API gateway.
-
In the Name field, enter
recognizer-bot-api-gw. -
Under Specification, add the following specification:
openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-functionWhere:
function_id:for-recognizer-botfunction ID.service_account_id:recognizer-bot-saservice account ID.
-
Click Create.
-
Select the created API gateway. Save the Default domain field value. You will need it later.
-
Save the following specification to
spec.yaml:openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-functionWhere:
function_id:for-recognizer-botfunction ID.service_account_id:recognizer-bot-saservice account ID.
-
Run this command:
yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yamlWhere:
--name: API gateway name.--spec: Specification file.
Result:
done (5s) id: d5d1ud9bli1e******** folder_id: b1gc1t4cb638******** created_at: "2023-09-25T16:01:48.926Z" name: recognizer-bot-api-gw status: ACTIVE domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net log_group_id: ckgefpleo5eg******** connectivity: {} log_options: folder_id: b1gc1t4cb638********
To create an API gateway:
-
Describe the
yandex_api_gatewayproperties in the configuration file:resource "yandex_api_gateway" "recognizer-bot-api-gw" { name = "recognizer-bot-api-gw" spec = <<-EOT openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function EOT }Where:
name: API gateway name.spec: API gateway specification.
For more information about resource properties, see this Terraform article.
-
Make sure the configuration files are correct.
-
In the command line, navigate to the directory where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy the cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply -
Confirm creating the resources: type
yesin the terminal and press Enter.
-
To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.
Configure a link between the function and the Telegram bot
Install a webhook for your Telegram bot:
curl --request POST \
--url https://api.telegram.org/bot<bot_token>/setWebhook \
--header 'content-type: application/json' \
--data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'
Where:
<bot_token>: Telegram bot token.<API_gateway_domain>:recognizer-bot-api-gwAPI gateway's service domain.
Result:
{"ok":true,"result":true,"description":"Webhook was set"}
Test the bot
Chat with the bot:
-
Open Telegram and search for the bot by the specified
username. -
Send
/startto the chat.The bot should respond with:
The bot can do the following: * Recognize text from images. * Generate voice messages from text. * Convert voice messages to text. -
Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
-
Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
-
Send an image with text to the chat. The bot will respond with a message containing the recognized text.
Note
The image must meet these requirements.
How to delete the resources you created
To stop paying for the resources you created: