Developing a Telegram bot for text recognition in images, audio synthesis and recognition

Written by

Updated at November 26, 2024

Getting started
- Required paid resources
Create resources
Register the Telegram bot
Create a function
Create an API gateway
Configure a link between the function and the Telegram bot
Test the bot
How to delete the resources you created

In this tutorial, you will create a bot for Telegram that can:

Synthesize speech from a message text using the Yandex SpeechKit API v1.
Recognize speech in voice messages and convert it into text using the Yandex SpeechKit synchronous recognition API.
Recognize text in images using Yandex Vision OCR.

The Yandex API Gateway API gateway will receive requests from the bot and forward them to the Yandex Cloud Functions function for processing.

To create a bot:

If you no longer need the resources you created, delete them.

Getting started

Go to the management console and log in to Yandex Cloud or create an account if you do not have one yet.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one.

If you have an active billing account, you can go to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resources

The cost of Telegram bot support includes:

Fee for using SpeechKit (see SpeechKit pricing).
Fee for using Vision OCR (see Vision OCR pricing).
Fee for the number of function calls, computing resources allocated to executing the function, and outgoing traffic (see Cloud Functions pricing).
Fee for the number of requests to the API gateway and outgoing traffic (see API Gateway pricing).

Create resources

Create a service account named recognizer-bot-sa and assign it the ai.editor and functions.editor roles for your folder.

Prepare a ZIP archive with the function code:

Create a file named index.py and paste the code below to it.

index.py

import logging
import requests
import telebot
import json
import os
import base64

# Service endpoints and authentication data

API_TOKEN = os.environ['TELEGRAM_TOKEN']
vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText'
speechkit_url = 'https://stt.api.cloud.yandex.net/speech/v1/stt:recognize'
speechkit_synthesis_url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
folder_id = ""
iam_token = ''

logger = telebot.logger
telebot.logger.setLevel(logging.INFO)
bot = telebot.TeleBot(API_TOKEN, threaded=False)

# Getting the folder ID

def get_folder_id(iam_token, version_id):

    headers = {'Authorization': f'Bearer {iam_token}'}
    function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}',
                                   headers=headers)
    function_id_data = function_id_req.json()
    function_id = function_id_data['functionId']
    folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}',
                                 headers=headers)
    folder_id_data = folder_id_req.json()
    folder_id = folder_id_data['folderId']
    return folder_id

def process_event(event):

    request_body_dict = json.loads(event['body'])
    update = telebot.types.Update.de_json(request_body_dict)

    bot.process_new_updates([update])

def handler(event, context):
    global iam_token, folder_id
    iam_token = context.token["access_token"]
    version_id = context.function_version
    folder_id = get_folder_id(iam_token, version_id)
    process_event(event)
    return {
        'statusCode': 200
    }

# Command and message listeners

@bot.message_handler(commands=['help', 'start'])
def send_welcome(message):
    bot.reply_to(message,
                 "The bot can do the following:\n*Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.")

@bot.message_handler(func=lambda message: True, content_types=['text'])
def echo_message(message):
    global iam_token, folder_id
    with open('/tmp/audio.ogg', "wb") as f:
        for audio_content in synthesize(folder_id, iam_token, message.text):
            f.write(audio_content)
    voice = open('/tmp/audio.ogg', 'rb')
    bot.send_voice(message.chat.id, voice)

@bot.message_handler(func=lambda message: True, content_types=['voice'])
def echo_audio(message):
    file_id = message.voice.file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    response_text = audio_analyze(speechkit_url, iam_token, folder_id, downloaded_file)
    bot.reply_to(message, response_text)

@bot.message_handler(func=lambda message: True, content_types=['photo'])
def echo_photo(message):
    file_id = message.photo[-1].file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    image_data = base64.b64encode(downloaded_file).decode('utf-8')
    response_text = image_analyze(vision_url, iam_token, folder_id, image_data)
    bot.reply_to(message, response_text)

# Image recognition

def image_analyze(vision_url, iam_token, folder_id, image_data):
    response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={
        "mimeType": "image",
        "languageCodes": ["en", "ru"],
        "model": "page",
        "content": image_data
        })
    blocks = response.json()['result']['textAnnotation']['blocks']
    text = ''
    for block in blocks:
        for line in block['lines']:
            for word in line['words']:
                text += word['text'] + ' '
            text += '\n'
    return text

# Speech recognition

def audio_analyze(speechkit_url, iam_token, folder_id, audio_data):
    headers = {'Authorization': f'Bearer {iam_token}'}
    params = {
        "topic": "general",
        "folderId": f"{folder_id}",
        "lang": "ru-RU"}

    audio_request = requests.post(speechkit_url, params=params, headers=headers, data=audio_data)
    responseData = audio_request.json()
    response = 'error'
    if responseData.get("error_code") is None:
        response = (responseData.get("result"))
    return response

# Speech synthesis

def synthesize(folder_id, iam_token, text):
   headers = {
       'Authorization': 'Bearer ' + iam_token,
   }

   data = {
       'text': text,
       'lang': 'ru-RU',
       'voice': 'filipp',
       'folderId': folder_id
   }

   with requests.post(speechkit_synthesis_url, headers=headers, data=data, stream=True) as resp:
       if resp.status_code != 200:
           raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))

       for chunk in resp.iter_content(chunk_size=None):
           yield chunk

Create a file named requirements.txt and specify in it the library to work with the bot.
```
telebot
```
Add both files to the index.zip archive.

Register the Telegram bot

Run the BotFather and send it the following command:
```
/newbot
```
In the name field, specify the name of the bot you are creating. This is the name users will see when communicating with the bot.
In the username field, specify the username of the bot you are creating. You can use the username to search for the bot in Telegram. The username must end with ...Bot or ..._bot.

As a result, you will get a token. Save it. You will need it later.

Create a function

Create a function to process user actions in the chat.

Management console

Yandex Cloud CLI

Terraform

API

In the management console, select the folder where you want to create a function.
In the list of services, select Cloud Functions.
Create a function:
1. Click Create function.
2. Enter the function name: for-recognizer-bot.
3. Click Create.
Create a function version:
1. Select the Python runtime environment, disable the Add files with code examples option, and click Continue.
2. Specify the ZIP archive upload method and select the index.zip archive prepared earlier.
3. Specify the entry point: index.handler.
4. Under Parameters, specify:
  - Timeout, sec: 30
  - Memory: 128 MB
  - Service account: recognizer-bot-sa
  - Environment variables:
    - TELEGRAM_TOKEN: Your Telegram bot token
5. Click Save changes.

Create a function named for-recognizer-bot:

yc serverless function create --name=for-recognizer-bot

Result:

id: b09bhaokchn9********
folder_id: aoek49ghmknn********
created_at: "2023-03-21T10:03:37.475Z"
name: for-recognizer-bot
log_group_id: eolm8aoq9vcp********
http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9********
status: ACTIVE

Create a version of the for-recognizer-bot function:

yc serverless function version create \
  --function-name for-recognizer-bot \
  --memory=128m \
  --execution-timeout=30s \
  --runtime=python312 \
  --entrypoint=index.handler \
  --service-account-id=<service_account_ID> \
  --environment TELEGRAM_TOKEN=<bot_token> \
  --source-path=./index.zip

Where:

--function-name: Name of the function a version of which you are creating.
--memory: Amount of RAM.
--execution-timeout: Maximum function running time before the timeout is reached.
--runtime: Runtime environment.
--entrypoint: Entry point.
--service-account-id: recognizer-bot-sa service account ID.
--environment: Environment variables.
--source-path: Path to the index.zip archive.

Result:

done (1s)
id: d4e6qqlh53nu********
function_id: d4emc80mnp5n********
created_at: "2023-03-22T16:49:41.800Z"
runtime: python312
entrypoint: index.handler
resources:
  memory: "134217728"
execution_timeout: 30s
service_account_id: aje20nhregkc********
image_size: "4096"
status: ACTIVE
tags:
  - $latest
log_group_id: ckgmc3l93cl0********
environment:
  TELEGRAM_TOKEN: <bot_token>
log_options:
  folder_id: b1g86q4m5vej********

In the configuration file, describe the function parameters:
```
resource "yandex_function" "for-recognizer-bot-function" {
  name               = "for-recognizer-bot"
  user_hash          = "first function"
  runtime            = "python312"
  entrypoint         = "index.handler"
  memory             = "128"
  execution_timeout  = "30"
  service_account_id = "aje20nhregkcvu******"
  environment = {
    TELEGRAM_TOKEN = <bot_token>
  }
  content {
    zip_filename = "./index.zip"
  }
}
```
Where:
- name: Function name.
- user_hash: Any string to identify the function version.
- runtime: Function runtime environment.
- entrypoint: Entry point.
- memory: Amount of memory allocated for the function, in MB.
- execution_timeout: Function execution timeout.
- service_account_id: recognizer-bot-sa service account ID.
- environment: Environment variables.
- content: Path to the index.zip archive with the function source code.
For more information about the yandex_function resource parameters, see the relevant provider documentation.
Make sure the configuration files are correct.
1. In the command line, go to the folder where you created the configuration file.
2. Run a check using this command:
```
terraform plan
```
If the configuration is described correctly, the terminal will display a list of created resources and their parameters. If the configuration contains any errors, Terraform will point them out.
Deploy cloud resources.
1. If the configuration does not contain any errors, run this command:
```
terraform apply
```
2. Confirm creating the function: type yes in the terminal and press Enter.

To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.

To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.

Create an API gateway

The Telegram server will notify your bot of new messages using a webhook. The API gateway will accept requests on the bot side and redirect them to the for-recognizer-bot function for processing.

Management console

CLI

Terraform

API

In the management console, select the folder where you want to create an API gateway.
In the list of services, select API Gateway.
Click Create API gateway.
In the Name field, enter recognizer-bot-api-gw.

In the Specification section, add the specification:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Click Create.
Select the created API gateway. Save the Default domain field value. You will need it later.

Save the following specification to the spec.yaml file:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Run this command:

yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml

Where:

--name: API gateway name.
--spec: Specification file.

Result:

done (5s)
id: d5d1ud9bli1e********
folder_id: b1gc1t4cb638********
created_at: "2023-09-25T16:01:48.926Z"
name: recognizer-bot-api-gw
status: ACTIVE
domain: d5d1ud9bli1e********.apigw.yandexcloud.net
log_group_id: ckgefpleo5eg********
connectivity: {}
log_options:
  folder_id: b1gc1t4cb638********

To create an API gateway:

Describe the parameters of the yandex_api_gateway resource in the configuration file:

resource "yandex_api_gateway" "recognizer-bot-api-gw" {
  name        = "recognizer-bot-api-gw"
  spec = <<-EOT
    openapi: 3.0.0
    info:
      title: Sample API
      version: 1.0.0

    paths:
      /for-recognizer-bot-function:
        post:
          x-yc-apigateway-integration:
            type: cloud_functions
            function_id: <function_ID>
            service_account_id: <service_account_ID>
          operationId: for-recognizer-bot-function
  EOT
}

Where:

name: API gateway name.
spec: API gateway specification.

For more information about the resource parameters in Terraform, see the relevant provider documentation.

Make sure the configuration files are correct.
1. In the command line, go to the folder where you created the configuration file.
2. Run a check using this command:
```
terraform plan
```
If the configuration is described correctly, the terminal will display a list of created resources and their parameters. If the configuration contains any errors, Terraform will point them out.
Deploy cloud resources.
1. If the configuration does not contain any errors, run this command:
```
terraform apply
```
2. Confirm creating the resources: type yes in the terminal and press Enter.

To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.

Configure a link between the function and the Telegram bot

Install a webhook for your Telegram bot:

curl --request POST \
  --url https://api.telegram.org/bot<bot_token>/setWebhook \
  --header 'content-type: application/json' \ 
  --data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'

Where:

<bot_token>: Telegram bot token.
<API_gateway_domain>: recognizer-bot-api-gw API gateway's service domain.

Result:

{"ok":true,"result":true,"description":"Webhook was set"}

Test the bot

Talk to the bot:

Open Telegram and search for the bot by the specified username.

Send the /start message to the chat.

The bot must respond with:

The bot can do the following:

* Recognize text from images.
* Generate voice messages from text.
* Convert voice messages to text.

Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
Send an image with text to the chat. The bot will respond with a message containing the recognized text.

Note

The image must meet the requirements.

How to delete the resources you created

To stop paying for the resources you created:

Delete the API Gateway API gateway.
Delete the Cloud Functions function.

Developing a Telegram bot for text recognition in images, audio synthesis and recognition

Getting startedGetting started

Required paid resourcesRequired paid resources

Create resourcesCreate resources

Register the Telegram botRegister the Telegram bot

Create a functionCreate a function

Create an API gatewayCreate an API gateway

Configure a link between the function and the Telegram botConfigure a link between the function and the Telegram bot

Test the botTest the bot

How to delete the resources you createdHow to delete the resources you created

Was the article helpful?

Getting started

Required paid resources

Create resources

Register the Telegram bot

Create a function

Create an API gateway

Configure a link between the function and the Telegram bot

Test the bot

How to delete the resources you created