Creating a Telegram bot for text recognition in images, speech synthesis, and audio recognition
In this tutorial, you will learn how to create a Telegram bot that can:
- Convert text messages to speech
and transcribe voice messages using the Yandex SpeechKit Python SDK . - Recognize text
in images with Yandex Vision OCR.
Authentication in Yandex Cloud services is performed using a service account with an IAM token. The IAM token resides in the handler function context, where the handler manages user interaction with the bot.
The Yandex API Gateway API gateway will accept requests from your bot and forward them to the Yandex Cloud Functions handler function for processing.
To create a bot:
- Get your cloud ready.
- Set up required resources.
- Register your Telegram bot.
- Create a function.
- Create an API gateway.
- Bind the handler function to the bot.
- Test the bot.
If you no longer need the resources you created, delete them.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or create a new account. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVEorTRIAL_ACTIVEstatus. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page
Learn more about clouds and folders here.
Required paid resources
The cost of Telegram bot support includes:
- Fee for using SpeechKit (see SpeechKit pricing
). - Fee for using Vision OCR (see Vision OCR pricing
). - Fees based on the number of function calls, computing resources allocated for function execution, and outbound traffic (see Cloud Functions pricing).
- Fees based on the API gateway request count and outbound traffic (see API Gateway pricing).
Set up the required resources
-
Create a service account named
recognizer-bot-saand assign it theai.editorandfunctions.editorroles for your folder. -
Download
the FFmpeg package archive to ensure the SpeechKit Python SDK works correctly in the function runtime environment. -
Extract the
ffmpegandffprobebinary files from the archive and make them executable by running the following commands:chmod +x ffmpeg chmod +x ffprobe -
Create a ZIP archive containing the function code:
-
Create a file named
index.pyand paste the following code into it.index.py
import logging import requests import telebot import json import os import base64 from speechkit import model_repository, configure_credentials, creds from speechkit.stt import AudioProcessingType folder_id = "" iam_token = '' # Image recognition service endpoint and authentication data API_TOKEN = os.environ['TELEGRAM_TOKEN'] vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText' # Adding the ffmpeg directory to the system PATH path = os.environ.get("PATH") os.environ["PATH"] = path + ':/function/code' logger = telebot.logger telebot.logger.setLevel(logging.INFO) bot = telebot.TeleBot(API_TOKEN, threaded=False) # Getting the folder ID def get_folder_id(iam_token, version_id): headers = {'Authorization': f'Bearer {iam_token}'} function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}', headers=headers) function_id_data = function_id_req.json() function_id = function_id_data['functionId'] folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}', headers=headers) folder_id_data = folder_id_req.json() folder_id = folder_id_data['folderId'] return folder_id def process_event(event): request_body_dict = json.loads(event['body']) update = telebot.types.Update.de_json(request_body_dict) bot.process_new_updates([update]) def handler(event, context): global iam_token, folder_id iam_token = context.token["access_token"] version_id = context.function_version folder_id = get_folder_id(iam_token, version_id) # Authenticating in SpeechKit with an IAM token configure_credentials( yandex_credentials=creds.YandexCredentials( iam_token=iam_token ) ) process_event(event) return { 'statusCode': 200 } # Command and message handlers @bot.message_handler(commands=['help', 'start']) def send_welcome(message): bot.reply_to(message, "The bot can do the following:\n* Recognize text in images.\n* Generate voice messages from text.\n* Convert voice messages to text.") @bot.message_handler(func=lambda message: True, content_types=['text']) def echo_message(message): export_path = '/tmp/audio.ogg' synthesize(message.text, export_path) with open(export_path, 'rb') as voice: bot.send_voice(message.chat.id, voice) @bot.message_handler(func=lambda message: True, content_types=['voice']) def echo_audio(message): file_id = message.voice.file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) response_text = audio_analyze(downloaded_file) bot.reply_to(message, response_text) @bot.message_handler(func=lambda message: True, content_types=['photo']) def echo_photo(message): file_id = message.photo[-1].file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) image_data = base64.b64encode(downloaded_file).decode('utf-8') response_text = image_analyze(vision_url, iam_token, folder_id, image_data) bot.reply_to(message, response_text) # Image recognition def image_analyze(vision_url, iam_token, folder_id, image_data): response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={ "mimeType": "image", "languageCodes": ["en", "ru"], "model": "page", "content": image_data }) blocks = response.json()['result']['textAnnotation']['blocks'] text = '' for block in blocks: for line in block['lines']: for word in line['words']: text += word['text'] + ' ' text += '\n' return text # Speech recognition def audio_analyze(audio_data): model = model_repository.recognition_model() # Recognition settings model.model = 'general' model.language = 'ru-RU' model.audio_processing_type = AudioProcessingType.Full result = model.transcribe(audio_data) speech_text = [res.normalized_text for res in result] return ' '.join(speech_text) # Speech synthesis def synthesize(text, export_path): model = model_repository.synthesis_model() # Synthesis settings model.voice = 'kirill' result = model.synthesize(text, raw_format=False) result.export(export_path, 'ogg') -
Create a file named
requirements.txt. In this file, specify the bot library and the Python SDK library:pyTelegramBotAPI==4.27 yandex-speechkit==1.5.0 -
Add the
index.py,requirements.txt,ffmpeg, andffprobefiles toindex.zip.
-
-
Create an Object Storage bucket and upload your ZIP archive to it.
Register your Telegram bot
Register your bot in Telegram and get its token.
-
Launch BotFather
and send it the following command:/newbot -
In the
namefield, specify the new bot’s name. This is the name users will see when chatting with the bot. -
In the
usernamefield, specify the new bot’s username. You can use it to find the bot in Telegram. The username must end with...Botor..._bot.In the end, you will get a token. Save it, as you will need it later.
Create a function
Create a function that will handle user actions in the chat.
-
In the management console
, select the folder where you want to create your function. -
Go to Cloud Functions.
-
Create a function:
- Click Create function.
- Specify the function name:
for-recognizer-bot. - Click Create.
-
Create a function version:
-
Select
Pythonas the runtime environment, disable Add files with code examples, and click Continue. -
Specify the upload method
Object Storageand select the bucket you created earlier. In the Object field, specify the file name:index.zip. -
Specify the entry point:
index.handler. -
Under Parameters, specify:
-
Timeout:
30. -
Memory:
256 MB. -
Service account:
recognizer-bot-sa. -
Environment variables:
TELEGRAM_TOKEN: Your Telegram bot token.
-
-
Click Save changes.
-
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id options.
-
Create a function named
for-recognizer-bot:yc serverless function create --name=for-recognizer-botResult:
id: b09bhaokchn9******** folder_id: aoek49ghmknn******** created_at: "2023-03-21T10:03:37.475Z" name: for-recognizer-bot log_group_id: eolm8aoq9vcp******** http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9******** status: ACTIVE -
Create a version of the
for-recognizer-botfunction:yc serverless function version create \ --function-name for-recognizer-bot \ --memory=256m \ --execution-timeout=30s \ --runtime=python312 \ --entrypoint=index.handler \ --service-account-id=<service_account_ID> \ --environment TELEGRAM_TOKEN=<bot_token> \ --package-bucket-name=<bucket_name> \ --package-object-name=index.zipWhere:
--function-name: Name of the function whose version you are creating.--memory: Amount of RAM.--execution-timeout: Maximum function runtime before timeout.--runtime: Runtime environment.--entrypoint: Entry point.--service-account-id:recognizer-bot-saservice account ID.--environment: Environment variables.--package-bucket-name: Bucket name.--package-object-name: File key in theindex.zipbucket.
Result:
done (1s) id: d4e6qqlh53nu******** function_id: d4emc80mnp5n******** created_at: "2025-03-22T16:49:41.800Z" runtime: python312 entrypoint: index.handler resources: memory: "268435456" execution_timeout: 30s service_account_id: aje20nhregkc******** image_size: "4096" status: ACTIVE tags: - $latest log_group_id: ckgmc3l93cl0******** environment: TELEGRAM_TOKEN: <bot_token> log_options: folder_id: b1g86q4m5vej********
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
If you do not have Terraform yet, install it and configure the Yandex Cloud provider.
-
Describe your function parameters in the configuration file:
resource "yandex_function" "for-recognizer-bot-function" { name = "for-recognizer-bot" user_hash = "first function" runtime = "python312" entrypoint = "index.handler" memory = "256" execution_timeout = "30" service_account_id = "aje20nhregkcvu******" environment = { TELEGRAM_TOKEN = <bot_token> } package { bucket_name = <bucket_name> object_name = "index.zip" } }Where:
name: Function name.user_hash: User-defined string that identifies the function version.runtime: Function runtime environment.entrypoint: Entry point.memory: Amount of memory allocated for the function, in MB.execution_timeout: Function runtime timeout.service_account_id:recognizer-bot-saservice account ID.environment: Environment variables.package: Name of the bucket containing your previously uploadedindex.ziparchive with the function source code.
For more information about
yandex_functionresource properties, see this provider guide. -
Validate your configuration files.
-
In the terminal, navigate to the directory where you created your configuration file.
-
Run a check using the following command:
terraform plan
If your configuration is correct, the terminal will display a list of the resources to be created and their settings. Otherwise, Terraform will show any detected errors.
-
-
Deploy the cloud resources.
-
If the configuration is correct, run this command:
terraform apply -
To confirm the function creation, type
yesin the terminal and press Enter.
-
To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.
To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.
Create an API gateway
The Telegram server will notify your bot of new messages via a webhookfor-recognizer-bot function for processing.
-
In the management console
, select the folder where you want to create an API gateway. -
Go to API Gateway.
-
Click Create API gateway.
-
In the Name field, specify
recognizer-bot-api-gw. -
Under Specification, add the following specification:
openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-functionWhere:
function_id:for-recognizer-botfunction ID.service_account_id:recognizer-bot-saservice account ID.
-
Click Create.
-
Select the previously created API gateway. Save the Default domain value, as you will need it later.
-
Save the following specification to
spec.yaml:openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-functionWhere:
function_id:for-recognizer-botfunction ID.service_account_id:recognizer-bot-saservice account ID.
-
Run this command:
yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yamlWhere:
--name: API gateway name.--spec: Specification file.
Result:
done (5s) id: d5d1ud9bli1e******** folder_id: b1gc1t4cb638******** created_at: "2023-09-25T16:01:48.926Z" name: recognizer-bot-api-gw status: ACTIVE domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net log_group_id: ckgefpleo5eg******** connectivity: {} log_options: folder_id: b1gc1t4cb638********
To create an API gateway:
-
Specify the
yandex_api_gatewayresource parameters in the configuration file:resource "yandex_api_gateway" "recognizer-bot-api-gw" { name = "recognizer-bot-api-gw" spec = <<-EOT openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function EOT }Where:
name: API gateway name.spec: API gateway specification.
For more information about Terraform resource parameters, see this provider guide.
-
Validate your configuration files.
-
In the terminal, navigate to the directory where you created your configuration file.
-
Run a check using the following command:
terraform plan
If your configuration is correct, the terminal will display a list of the resources to be created and their settings. Otherwise, Terraform will show any detected errors.
-
-
Deploy the cloud resources.
-
If the configuration is correct, run this command:
terraform apply -
To confirm resource creation, type
yesand press Enter.
-
To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.
Configure a link between the function and the Telegram bot
Set up a webhook for your Telegram bot:
curl --request POST \
--url 'https://api.telegram.org/bot<bot_token>/setWebhook' \
--header 'content-type: application/json' \
--data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'
Where:
<bot_token>: Telegram bot token.<API_gateway_domain>:recognizer-bot-api-gwAPI gateway's service domain.
Result:
{"ok":true,"result":true,"description":"Webhook was set"}
Test your bot
Chat with the bot:
-
Open Telegram and find the bot by its
username. -
Send
/startto the chat.The bot should respond with:
The bot can do the following: * Recognize text from images. * Generate voice messages from text. * Convert voice messages to text. -
Send a text message to the chat. The bot will respond with a voice message generated from your text.
-
Send a voice message to the chat. The bot will respond with a text message transcribed from your speech.
-
Send an image containing text to the chat. The bot will respond with a message containing the transcribed text.
Note
The image must meet the following requirements
.
How to delete the resources you created
To avoid incurring charges for resources you no longer need, delete them: