Developing a Telegram bot for text recognition in images, audio synthesis and recognition

Written by

Updated at November 6, 2025

Getting started
- Required paid resources
Set up resources
Register your Telegram bot
Create a function
Create an API gateway
Configure a link between the function and the Telegram bot
Test the bot
How to delete the resources you created

In this tutorial, you will create a Telegram bot that can:

Synthesize speech from a message text and recognize speech in voice messages using the Yandex SpeechKit Python SDK.
Recognize text in images using Yandex Vision OCR.

Authentication in Yandex Cloud services is performed under a service account using an IAM token. The IAM token is contained in the handler context of the function which manages user conversation with the bot.

The Yandex API Gateway will receive requests from your bot and forward them to Yandex Cloud Functions for processing.

To create a bot:

If you no longer need the resources you created, delete them.

Getting started

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure.

Learn more about clouds and folders here.

Required paid resources

The cost of Telegram bot support includes:

Fee for using SpeechKit (see SpeechKit pricing).
Fee for using Vision OCR (see Vision OCR pricing).
Fee for function invocation count, computing resources allocated to run the function, and outbound traffic (see Cloud Functions pricing).
Fee for the number of requests to the API gateway and outbound traffic (see API Gateway pricing).

Set up resources

Create a service account named recognizer-bot-sa and assign it the ai.editor and functions.editor roles for your folder.
Download the archive with the FFmpeg package for the SpeechKit Python SDK to work correctly in the function execution environment.
Extract the ffmpeg and ffprobe binary files from the archive and run these commands to make them executable:
```
chmod +x ffmpeg
chmod +x ffprobe
```

Create a ZIP archive with the function code:

Create a file named index.py and paste the code below to it.

index.py

import logging
import requests
import telebot
import json
import os
import base64
from speechkit import model_repository, configure_credentials, creds
from speechkit.stt import AudioProcessingType


folder_id = ""
iam_token = ''

# Image recognition service endpoint and authentication data

API_TOKEN = os.environ['TELEGRAM_TOKEN']
vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText'

# Adding the folder with ffmpeg to the system PATH

path = os.environ.get("PATH")
os.environ["PATH"] = path + ':/function/code'

logger = telebot.logger
telebot.logger.setLevel(logging.INFO)
bot = telebot.TeleBot(API_TOKEN, threaded=False)

# Getting the folder ID

def get_folder_id(iam_token, version_id):
    headers = {'Authorization': f'Bearer {iam_token}'}
    function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}',
                                   headers=headers)
    function_id_data = function_id_req.json()
    function_id = function_id_data['functionId']
    folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}',
                                 headers=headers)
    folder_id_data = folder_id_req.json()
    folder_id = folder_id_data['folderId']
    return folder_id

def process_event(event):
    request_body_dict = json.loads(event['body'])
    update = telebot.types.Update.de_json(request_body_dict)

    bot.process_new_updates([update])

def handler(event, context):
    global iam_token, folder_id
    iam_token = context.token["access_token"]
    version_id = context.function_version
    folder_id = get_folder_id(iam_token, version_id)

    # Authenticating in SpeechKit with an IAM token
    configure_credentials(
        yandex_credentials=creds.YandexCredentials(
            iam_token=iam_token
        )
    )

    process_event(event)
    return {
        'statusCode': 200
    }

# Command and message handlers

@bot.message_handler(commands=['help', 'start'])
def send_welcome(message):
    bot.reply_to(message,
                 "The bot can do the following:\n* Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.")

@bot.message_handler(func=lambda message: True, content_types=['text'])
def echo_message(message):
    export_path = '/tmp/audio.ogg'
    synthesize(message.text, export_path)
    with open(export_path, 'rb') as voice:
        bot.send_voice(message.chat.id, voice)

@bot.message_handler(func=lambda message: True, content_types=['voice'])
def echo_audio(message):
    file_id = message.voice.file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    response_text = audio_analyze(downloaded_file)
    bot.reply_to(message, response_text)

@bot.message_handler(func=lambda message: True, content_types=['photo'])
def echo_photo(message):
    file_id = message.photo[-1].file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    image_data = base64.b64encode(downloaded_file).decode('utf-8')
    response_text = image_analyze(vision_url, iam_token, folder_id, image_data)
    bot.reply_to(message, response_text)

# Image recognition

def image_analyze(vision_url, iam_token, folder_id, image_data):
    response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={
        "mimeType": "image",
        "languageCodes": ["en", "ru"],
        "model": "page",
        "content": image_data
        })
    blocks = response.json()['result']['textAnnotation']['blocks']
    text = ''
    for block in blocks:
        for line in block['lines']:
            for word in line['words']:
                text += word['text'] + ' '
            text += '\n'
    return text

# Speech recognition

def audio_analyze(audio_data):
    model = model_repository.recognition_model()

    # Recognition settings
    model.model = 'general'
    model.language = 'ru-RU'
    model.audio_processing_type = AudioProcessingType.Full

    result = model.transcribe(audio_data)
    speech_text = [res.normalized_text for res in result]
    return ' '.join(speech_text)

# Speech synthesis

def synthesize(text, export_path):
    model = model_repository.synthesis_model()

    # Synthesis settings
    model.voice = 'kirill'

    result = model.synthesize(text, raw_format=False)
    result.export(export_path, 'ogg')

Create a file named requirements.txt. In this file, specify a library to use for the bot and the Python SDK library.
```
pyTelegramBotAPI==4.27
yandex-speechkit==1.5.0
```
Add the index.py, requirements.txt, ffmpeg, and ffprobe files into the ZIP archive.

Create an Object Storage bucket and upload the created ZIP archive into it.

Register your Telegram bot

Start BotFather and send it the following command:
```
/newbot
```
In the name field, enter a name for the new bot. This is the name users will see when chatting with the bot.
In the username field, enter a username for the new bot. You can use it to find the bot in Telegram. The username must end with ...Bot or ..._bot.

Once done, you will get a token. Save it, as you will need it later.

Create a function

Create a function to process user actions in the chat.

Management console

CLI

Terraform

API

In the management console, select the folder where you want to create a function.
In the list of services, select Cloud Functions.
Create a function:
1. Click Create function.
2. Enter the function name: for-recognizer-bot.
3. Click Create.
Create a function version:
1. Select Python as the runtime environment, disable Add files with code examples, and click Continue.
2. Specify the upload method Object Storage and select the bucket you created earlier. In the Object field, specify the file name: index.zip.
3. Specify the entry point: index.handler.
4. Under Parameters, specify:
  - Timeout: 30.
  - Memory: 256 MB.
  - Service account: recognizer-bot-sa.
  - Environment variables:
    - TELEGRAM_TOKEN: Your Telegram bot token.
5. Click Save changes.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

Create a function named for-recognizer-bot:

yc serverless function create --name=for-recognizer-bot

Result:

id: b09bhaokchn9********
folder_id: aoek49ghmknn********
created_at: "2023-03-21T10:03:37.475Z"
name: for-recognizer-bot
log_group_id: eolm8aoq9vcp********
http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9********
status: ACTIVE

Create a version of the for-recognizer-bot function:

yc serverless function version create \
  --function-name for-recognizer-bot \
  --memory=256m \
  --execution-timeout=30s \
  --runtime=python312 \
  --entrypoint=index.handler \
  --service-account-id=<service_account_ID> \
  --environment TELEGRAM_TOKEN=<bot_token> \
  --package-bucket-name=<bucket_name> \
  --package-object-name=index.zip

Where:

--function-name: Name of the function whose version you are creating.
--memory: Amount of RAM.
--execution-timeout: Maximum function running time before timeout.
--runtime: Runtime environment.
--entrypoint: Entry point.
--service-account-id: recognizer-bot-sa service account ID.
--environment: Environment variables.
--package-bucket-name: Bucket name.
--package-object-name: File key in the index.zip bucket.

Result:

done (1s)
id: d4e6qqlh53nu********
function_id: d4emc80mnp5n********
created_at: "2025-03-22T16:49:41.800Z"
runtime: python312
entrypoint: index.handler
resources:
  memory: "268435456"
execution_timeout: 30s
service_account_id: aje20nhregkc********
image_size: "4096"
status: ACTIVE
tags:
  - $latest
log_group_id: ckgmc3l93cl0********
environment:
  TELEGRAM_TOKEN: <bot_token>
log_options:
  folder_id: b1g86q4m5vej********

With Terraform, you can quickly create a cloud infrastructure in Yandex Cloud and manage it using configuration files. These files store the infrastructure description written in HashiCorp Configuration Language (HCL). If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.

Terraform is distributed under the Business Source License. The Yandex Cloud provider for Terraform is distributed under the MPL-2.0 license.

For more information about the provider resources, see the relevant documentation on the Terraform website or its mirror.

If you do not have Terraform yet, install it and configure the Yandex Cloud provider.

In the configuration file, describe the function settings:
```
resource "yandex_function" "for-recognizer-bot-function" {
  name               = "for-recognizer-bot"
  user_hash          = "first function"
  runtime            = "python312"
  entrypoint         = "index.handler"
  memory             = "256"
  execution_timeout  = "30"
  service_account_id = "aje20nhregkcvu******"
  environment = {
    TELEGRAM_TOKEN = <bot_token>
  }
  package {
    bucket_name = <bucket_name>
    object_name = "index.zip"
  }
}
```
Where:
- name: Function name.
- user_hash: Custom string to define the function version.
- runtime: Function runtime environment.
- entrypoint: Entry point.
- memory: Amount of memory allocated for the function, in MB.
- execution_timeout: Function execution timeout.
- service_account_id: recognizer-bot-sa service account ID.
- environment: Environment variables.
- package: Name of the bucket containing the uploaded index.zip archive with the function source code.
For more information about yandex_function properties, see the relevant provider documentation.
Make sure the configuration files are correct.
1. In the command line, navigate to the directory where you created the configuration file.
2. Run a check using this command:
```
terraform plan
```
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
Deploy the cloud resources.
1. If the configuration does not contain any errors, run this command:
```
terraform apply
```
2. Confirm creating the function by typing yes in the terminal and pressing Enter.

To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.

To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.

Create an API gateway

The Telegram server will notify your bot of new messages using a webhook. The API gateway will receive requests on the bot side and forward them to the for-recognizer-bot function for processing.

Management console

CLI

Terraform

API

In the management console, select the folder where you want to create an API gateway.
In the list of services, select API Gateway.
Click Create API gateway.
In the Name field, enter recognizer-bot-api-gw.

Under Specification, add the following specification:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Click Create.
Select the created API gateway. Save the Default domain field value. You will need it later.

Save the following specification to spec.yaml:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Run this command:

yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml

Where:

--name: API gateway name.
--spec: Specification file.

Result:

done (5s)
id: d5d1ud9bli1e********
folder_id: b1gc1t4cb638********
created_at: "2023-09-25T16:01:48.926Z"
name: recognizer-bot-api-gw
status: ACTIVE
domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net
log_group_id: ckgefpleo5eg********
connectivity: {}
log_options:
  folder_id: b1gc1t4cb638********

To create an API gateway:

Describe the yandex_api_gateway properties in the configuration file:

resource "yandex_api_gateway" "recognizer-bot-api-gw" {
  name        = "recognizer-bot-api-gw"
  spec = <<-EOT
    openapi: 3.0.0
    info:
      title: Sample API
      version: 1.0.0

    paths:
      /for-recognizer-bot-function:
        post:
          x-yc-apigateway-integration:
            type: cloud_functions
            function_id: <function_ID>
            service_account_id: <service_account_ID>
          operationId: for-recognizer-bot-function
  EOT
}

Where:

name: API gateway name.
spec: API gateway specification.

For more information about resource properties, see this Terraform article.

Make sure the configuration files are correct.
1. In the command line, navigate to the directory where you created the configuration file.
2. Run a check using this command:
```
terraform plan
```
If the configuration description is correct, the terminal will display a list of the resources being created and their settings. If the configuration contains any errors, Terraform will point them out.
Deploy the cloud resources.
1. If the configuration does not contain any errors, run this command:
```
terraform apply
```
2. Confirm creating the resources: type yes in the terminal and press Enter.

To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.

Configure a link between the function and the Telegram bot

Install a webhook for your Telegram bot:

curl --request POST \
  --url https://api.telegram.org/bot<bot_token>/setWebhook \
  --header 'content-type: application/json' \ 
  --data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'

Where:

<bot_token>: Telegram bot token.
<API_gateway_domain>: recognizer-bot-api-gw API gateway's service domain.

Result:

{"ok":true,"result":true,"description":"Webhook was set"}

Test the bot

Chat with the bot:

Open Telegram and search for the bot by the specified username.

Send /start to the chat.

The bot should respond with:

The bot can do the following:

* Recognize text from images.
* Generate voice messages from text.
* Convert voice messages to text.

Send a text message to the chat. The bot will respond with a voice message synthesized from your text.
Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.
Send an image with text to the chat. The bot will respond with a message containing the recognized text.

Note

The image must meet these requirements.

How to delete the resources you created

To stop paying for the resources you created:

Delete the API Gateway.
Delete the function in Cloud Functions.

Developing a Telegram bot for text recognition in images, audio synthesis and recognition

Getting startedGetting started

Required paid resourcesRequired paid resources

Set up resourcesSet up resources

Register your Telegram botRegister your Telegram bot

Create a functionCreate a function

Create an API gatewayCreate an API gateway

Configure a link between the function and the Telegram botConfigure a link between the function and the Telegram bot

Test the botTest the bot

How to delete the resources you createdHow to delete the resources you created

Was the article helpful?

Getting started

Required paid resources

Set up resources

Register your Telegram bot

Create a function

Create an API gateway

Configure a link between the function and the Telegram bot

Test the bot

How to delete the resources you created