Developing a Telegram bot for text recognition in images, audio synthesis and recognition
In this tutorial, you will create a bot for Telegram that can:
- Synthesize speech from a message text using the Yandex SpeechKit API v1.
- Recognize speech in voice messages and convert it into text using the Yandex SpeechKit synchronous recognition API.
- Recognize text in images using Yandex Vision OCR.
The Yandex API Gateway API gateway will receive requests from the bot and forward them to the Yandex Cloud Functions function for processing.
To create a bot:
- Prepare your cloud.
- Create resources.
- Register the Telegram bot.
- Create a function.
- Create an API gateway.
- Link the function and the bot.
- Test the bot.
If you no longer need the resources you created, delete them.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost of Telegram bot support includes:
- Fee for using SpeechKit (see SpeechKit pricing).
- Fee for using Vision OCR (see Vision OCR pricing).
- Fee for the number of function calls, computing resources allocated to executing the function, and outgoing traffic (see Cloud Functions pricing).
- Fee for the number of requests to the API gateway and outgoing traffic (see API Gateway pricing).
Create resources
-
Create a service account named
recognizer-bot-sa
and assign it theai.editor
andfunctions.editor
roles for your folder. -
Prepare a ZIP archive with the function code:
-
Create a file named
index.py
and paste the code below to it.index.py
import logging import requests import telebot import json import os import base64 # Service endpoints and authentication data API_TOKEN = os.environ['TELEGRAM_TOKEN'] vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText' speechkit_url = 'https://stt.api.cloud.yandex.net/speech/v1/stt:recognize' speechkit_synthesis_url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize' folder_id = "" iam_token = '' logger = telebot.logger telebot.logger.setLevel(logging.INFO) bot = telebot.TeleBot(API_TOKEN, threaded=False) # Getting the folder ID def get_folder_id(iam_token, version_id): headers = {'Authorization': f'Bearer {iam_token}'} function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}', headers=headers) function_id_data = function_id_req.json() function_id = function_id_data['functionId'] folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}', headers=headers) folder_id_data = folder_id_req.json() folder_id = folder_id_data['folderId'] return folder_id def process_event(event): request_body_dict = json.loads(event['body']) update = telebot.types.Update.de_json(request_body_dict) bot.process_new_updates([update]) def handler(event, context): global iam_token, folder_id iam_token = context.token["access_token"] version_id = context.function_version folder_id = get_folder_id(iam_token, version_id) process_event(event) return { 'statusCode': 200 } # Command and message listeners @bot.message_handler(commands=['help', 'start']) def send_welcome(message): bot.reply_to(message, "The bot can do the following:\n*Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.") @bot.message_handler(func=lambda message: True, content_types=['text']) def echo_message(message): global iam_token, folder_id with open('/tmp/audio.ogg', "wb") as f: for audio_content in synthesize(folder_id, iam_token, message.text): f.write(audio_content) voice = open('/tmp/audio.ogg', 'rb') bot.send_voice(message.chat.id, voice) @bot.message_handler(func=lambda message: True, content_types=['voice']) def echo_audio(message): file_id = message.voice.file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) response_text = audio_analyze(speechkit_url, iam_token, folder_id, downloaded_file) bot.reply_to(message, response_text) @bot.message_handler(func=lambda message: True, content_types=['photo']) def echo_photo(message): file_id = message.photo[-1].file_id file_info = bot.get_file(file_id) downloaded_file = bot.download_file(file_info.file_path) image_data = base64.b64encode(downloaded_file).decode('utf-8') response_text = image_analyze(vision_url, iam_token, folder_id, image_data) bot.reply_to(message, response_text) # Image recognition def image_analyze(vision_url, iam_token, folder_id, image_data): response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={ "mimeType": "image", "languageCodes": ["en", "ru"], "model": "page", "content": image_data }) blocks = response.json()['result']['textAnnotation']['blocks'] text = '' for block in blocks: for line in block['lines']: for word in line['words']: text += word['text'] + ' ' text += '\n' return text # Speech recognition def audio_analyze(speechkit_url, iam_token, folder_id, audio_data): headers = {'Authorization': f'Bearer {iam_token}'} params = { "topic": "general", "folderId": f"{folder_id}", "lang": "ru-RU"} audio_request = requests.post(speechkit_url, params=params, headers=headers, data=audio_data) responseData = audio_request.json() response = 'error' if responseData.get("error_code") is None: response = (responseData.get("result")) return response # Speech synthesis def synthesize(folder_id, iam_token, text): headers = { 'Authorization': 'Bearer ' + iam_token, } data = { 'text': text, 'lang': 'ru-RU', 'voice': 'filipp', 'folderId': folder_id } with requests.post(speechkit_synthesis_url, headers=headers, data=data, stream=True) as resp: if resp.status_code != 200: raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text)) for chunk in resp.iter_content(chunk_size=None): yield chunk
-
Create a file named
requirements.txt
and specify in it the library to work with the bot.telebot
-
Add both files to the
index.zip
archive.
-
Register the Telegram bot
Register the bot in Telegram and get a token.
-
Run the BotFather
and send it the following command:/newbot
-
In the
name
field, specify the name of the bot you are creating. This is the name users will see when communicating with the bot. -
In the
username
field, specify the username of the bot you are creating. You can use the user name to search for the bot in Telegram. The username must end with...Bot
or..._bot
.As a result, you will get a token. Save it. You will need it later.
Create a function
Create a function to process user actions in the chat.
-
In the management console
, select the folder where you want to create a function. -
In the list of services, select Cloud Functions.
-
Create a function:
- Click Create function.
- Enter the function name:
for-recognizer-bot
. - Click Create.
-
Create a function version:
-
Select the
Python
runtime environment, disable the Add files with code examples option, and click Continue. -
Specify the
ZIP archive
upload method and select theindex.zip
archive prepared earlier. -
Specify the entry point:
index.handler
. -
Under Parameters, specify:
-
Timeout, sec:
30
-
Memory:
128 MB
-
Service account:
recognizer-bot-sa
-
Environment variables:
TELEGRAM_TOKEN
: Your Telegram bot token
-
-
Click Save changes.
-
-
Create a function named
for-recognizer-bot
:yc serverless function create --name=for-recognizer-bot
Result:
id: b09bhaokchn9******** folder_id: aoek49ghmknn******** created_at: "2023-03-21T10:03:37.475Z" name: for-recognizer-bot log_group_id: eolm8aoq9vcp******** http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9******** status: ACTIVE
-
Create a version of the
for-recognizer-bot
function:yc serverless function version create \ --function-name for-recognizer-bot \ --memory=128m \ --execution-timeout=30s \ --runtime=python312 \ --entrypoint=index.handler \ --service-account-id=<service_account_ID> \ --environment TELEGRAM_TOKEN=<bot_token> \ --source-path=./index.zip
Where:
--function-name
: Name of the function a version of which you are creating.--memory
: Amount of RAM.--execution-timeout
: Maximum function running time before the timeout is reached.--runtime
: Runtime environment.--entrypoint
: Entry point.--service-account-id
:recognizer-bot-sa
service account ID.--environment
: Environment variables.--source-path
: Path to theindex.zip
archive.
Result:
done (1s) id: d4e6qqlh53nu******** function_id: d4emc80mnp5n******** created_at: "2023-03-22T16:49:41.800Z" runtime: python312 entrypoint: index.handler resources: memory: "134217728" execution_timeout: 30s service_account_id: aje20nhregkc******** image_size: "4096" status: ACTIVE tags: - $latest log_group_id: ckgmc3l93cl0******** environment: TELEGRAM_TOKEN: <bot_token> log_options: folder_id: b1g86q4m5vej********
-
In the configuration file, describe the function parameters:
resource "yandex_function" "for-recognizer-bot-function" { name = "for-recognizer-bot" user_hash = "first function" runtime = "python312" entrypoint = "index.handler" memory = "128" execution_timeout = "30" service_account_id = "aje20nhregkcvu******" environment = { TELEGRAM_TOKEN = <bot_token> } content { zip_filename = "./index.zip" } }
Where:
name
: Function name.user_hash
: Any string to identify the function version.runtime
: Function runtime environment.entrypoint
: Entry point.memory
: Amount of memory allocated for the function, in MB.execution_timeout
: Function execution timeout.service_account_id
:recognizer-bot-sa
service account ID.environment
: Environment variables.content
: Path to theindex.zip
archive with the function source code.
For more information about the
yandex_function
resource parameters, see the relevant provider documentation . -
Make sure the configuration files are correct.
-
In the command line, go to the folder where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration is described correctly, the terminal will display a list of created resources and their parameters. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply
-
Confirm creating the function: type
yes
in the terminal and press Enter.
-
To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.
To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.
Create an API gateway
The Telegram server will notify your bot of new messages using a webhookfor-recognizer-bot
function for processing.
-
In the management console
, select the folder where you want to create an API gateway. -
In the list of services, select API Gateway.
-
Click Create API gateway.
-
In the Name field, enter
recognizer-bot-api-gw
. -
In the Specification section, add the specification:
openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function
Where:
function_id
:for-recognizer-bot
function ID.service_account_id
:recognizer-bot-sa
service account ID.
-
Click Create.
-
Select the created API gateway. Save the Default domain field value from the General information section. You will need it later.
-
Save the following specification to the
spec.yaml
file:openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function
Where:
function_id
:for-recognizer-bot
function ID.service_account_id
:recognizer-bot-sa
service account ID.
-
Run this command:
yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml
Where:
--name
: API gateway name.--spec
: Specification file.
Result:
done (5s) id: d5d1ud9bli1e******** folder_id: b1gc1t4cb638******** created_at: "2023-09-25T16:01:48.926Z" name: recognizer-bot-api-gw status: ACTIVE domain: d5d1ud9bli1e********.apigw.yandexcloud.net log_group_id: ckgefpleo5eg******** connectivity: {} log_options: folder_id: b1gc1t4cb638********
To create an API gateway:
-
Describe the parameters of the
yandex_api_gateway
resource in the configuration file:resource "yandex_api_gateway" "recognizer-bot-api-gw" { name = "recognizer-bot-api-gw" spec = <<-EOT openapi: 3.0.0 info: title: Sample API version: 1.0.0 paths: /for-recognizer-bot-function: post: x-yc-apigateway-integration: type: cloud_functions function_id: <function_ID> service_account_id: <service_account_ID> operationId: for-recognizer-bot-function EOT }
Where:
name
: API gateway name.spec
: API gateway specification.
For more information about the resource parameters in Terraform, see the relevant provider documentation
. -
Make sure the configuration files are correct.
-
In the command line, go to the folder where you created the configuration file.
-
Run a check using this command:
terraform plan
If the configuration is described correctly, the terminal will display a list of created resources and their parameters. If the configuration contains any errors, Terraform will point them out.
-
-
Deploy cloud resources.
-
If the configuration does not contain any errors, run this command:
terraform apply
-
Confirm creating the resources: type
yes
in the terminal and press Enter.
-
To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.
Configure a link between the function and the Telegram bot
Install a webhook for your Telegram bot:
curl --request POST \
--url https://api.telegram.org/bot<bot_token>/setWebhook \
--header 'content-type: application/json' \
--data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'
Where:
<bot_token>
: Telegram bot token.<API_gateway_domain>
:recognizer-bot-api-gw
API gateway service domain.
Result:
{"ok":true,"result":true,"description":"Webhook was set"}
Test the bot
Talk to the bot:
-
Open Telegram and search for the bot by the specified
username
. -
Send the
/start
message to the chat.The bot must respond with:
The bot can do the following: * Recognize text from images. * Generate voice messages from text. * Convert voice messages to text.
-
Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
-
Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
-
Send an image with text to the chat. The bot will respond with a message containing the recognized text.
Note
The image must meet the requirements.
How to delete the resources you created
To stop paying for the resources you created: