Creating a Telegram bot for text recognition in images, speech synthesis, and audio recognition

Written by

Updated at May 5, 2026

Getting started
- Required paid resources
Set up the required resources
Register your Telegram bot
Create a function
Create an API gateway
Configure a link between the function and the Telegram bot
Test your bot
How to delete the resources you created

In this tutorial, you will learn how to create a Telegram bot that can:

Convert text messages to speech and transcribe voice messages using the Yandex SpeechKit Python SDK.
Recognize text in images with Yandex Vision OCR.

Authentication in Yandex Cloud services is performed using a service account with an IAM token. The IAM token resides in the handler function context, where the handler manages user interaction with the bot.

The Yandex API Gateway API gateway will accept requests from your bot and forward them to the Yandex Cloud Functions handler function for processing.

To create a bot:

If you no longer need the resources you created, delete them.

Getting started

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page.

Learn more about clouds and folders here.

Required paid resources

The cost of Telegram bot support includes:

Fee for using SpeechKit (see SpeechKit pricing).
Fee for using Vision OCR (see Vision OCR pricing).
Fees based on the number of function calls, computing resources allocated for function execution, and outbound traffic (see Cloud Functions pricing).
Fees based on the API gateway request count and outbound traffic (see API Gateway pricing).

Set up the required resources

Create a service account named recognizer-bot-sa and assign it the ai.editor and functions.editor roles for your folder.
Download the FFmpeg package archive to ensure the SpeechKit Python SDK works correctly in the function runtime environment.
Extract the ffmpeg and ffprobe binary files from the archive and make them executable by running the following commands:
```
chmod +x ffmpeg
chmod +x ffprobe
```

Create a ZIP archive containing the function code:

Create a file named index.py and paste the following code into it.

index.py

import logging
import requests
import telebot
import json
import os
import base64
from speechkit import model_repository, configure_credentials, creds
from speechkit.stt import AudioProcessingType


folder_id = ""
iam_token = ''

# Image recognition service endpoint and authentication data

API_TOKEN = os.environ['TELEGRAM_TOKEN']
vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText'

# Adding the ffmpeg directory to the system PATH

path = os.environ.get("PATH")
os.environ["PATH"] = path + ':/function/code'

logger = telebot.logger
telebot.logger.setLevel(logging.INFO)
bot = telebot.TeleBot(API_TOKEN, threaded=False)

# Getting the folder ID

def get_folder_id(iam_token, version_id):
    headers = {'Authorization': f'Bearer {iam_token}'}
    function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}',
                                   headers=headers)
    function_id_data = function_id_req.json()
    function_id = function_id_data['functionId']
    folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}',
                                 headers=headers)
    folder_id_data = folder_id_req.json()
    folder_id = folder_id_data['folderId']
    return folder_id

def process_event(event):
    request_body_dict = json.loads(event['body'])
    update = telebot.types.Update.de_json(request_body_dict)

    bot.process_new_updates([update])

def handler(event, context):
    global iam_token, folder_id
    iam_token = context.token["access_token"]
    version_id = context.function_version
    folder_id = get_folder_id(iam_token, version_id)

    # Authenticating in SpeechKit with an IAM token
    configure_credentials(
        yandex_credentials=creds.YandexCredentials(
            iam_token=iam_token
        )
    )

    process_event(event)
    return {
        'statusCode': 200
    }

# Command and message handlers

@bot.message_handler(commands=['help', 'start'])
def send_welcome(message):
    bot.reply_to(message,
                 "The bot can do the following:\n* Recognize text in images.\n* Generate voice messages from text.\n* Convert voice messages to text.")

@bot.message_handler(func=lambda message: True, content_types=['text'])
def echo_message(message):
    export_path = '/tmp/audio.ogg'
    synthesize(message.text, export_path)
    with open(export_path, 'rb') as voice:
        bot.send_voice(message.chat.id, voice)

@bot.message_handler(func=lambda message: True, content_types=['voice'])
def echo_audio(message):
    file_id = message.voice.file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    response_text = audio_analyze(downloaded_file)
    bot.reply_to(message, response_text)

@bot.message_handler(func=lambda message: True, content_types=['photo'])
def echo_photo(message):
    file_id = message.photo[-1].file_id
    file_info = bot.get_file(file_id)
    downloaded_file = bot.download_file(file_info.file_path)
    image_data = base64.b64encode(downloaded_file).decode('utf-8')
    response_text = image_analyze(vision_url, iam_token, folder_id, image_data)
    bot.reply_to(message, response_text)

# Image recognition

def image_analyze(vision_url, iam_token, folder_id, image_data):
    response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={
        "mimeType": "image",
        "languageCodes": ["en", "ru"],
        "model": "page",
        "content": image_data
        })
    blocks = response.json()['result']['textAnnotation']['blocks']
    text = ''
    for block in blocks:
        for line in block['lines']:
            for word in line['words']:
                text += word['text'] + ' '
            text += '\n'
    return text

# Speech recognition

def audio_analyze(audio_data):
    model = model_repository.recognition_model()

    # Recognition settings
    model.model = 'general'
    model.language = 'ru-RU'
    model.audio_processing_type = AudioProcessingType.Full

    result = model.transcribe(audio_data)
    speech_text = [res.normalized_text for res in result]
    return ' '.join(speech_text)

# Speech synthesis

def synthesize(text, export_path):
    model = model_repository.synthesis_model()

    # Synthesis settings
    model.voice = 'kirill'

    result = model.synthesize(text, raw_format=False)
    result.export(export_path, 'ogg')

Create a file named requirements.txt. In this file, specify the bot library and the Python SDK library:
```
pyTelegramBotAPI==4.27
yandex-speechkit==1.5.0
```
Add the index.py, requirements.txt, ffmpeg, and ffprobe files to index.zip.

Create an Object Storage bucket and upload your ZIP archive to it.

Register your Telegram bot

Launch BotFather and send it the following command:
```
/newbot
```
In the name field, specify the new bot’s name. This is the name users will see when chatting with the bot.
In the username field, specify the new bot’s username. You can use it to find the bot in Telegram. The username must end with ...Bot or ..._bot.

In the end, you will get a token. Save it, as you will need it later.

Create a function

Create a function that will handle user actions in the chat.

Management console

CLI

Terraform

API

In the management console, select the folder where you want to create your function.
Go to Cloud Functions.
Create a function:
1. Click Create function.
2. Specify the function name: for-recognizer-bot.
3. Click Create.
Create a function version:
1. Select Python as the runtime environment, disable Add files with code examples, and click Continue.
2. Specify the upload method Object Storage and select the bucket you created earlier. In the Object field, specify the file name: index.zip.
3. Specify the entry point: index.handler.
4. Under Parameters, specify:
  - Timeout: 30.
  - Memory: 256 MB.
  - Service account: recognizer-bot-sa.
  - Environment variables:
    - TELEGRAM_TOKEN: Your Telegram bot token.
5. Click Save changes.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also specify a different folder for any command using --folder-name or --folder-id. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

Create a function named for-recognizer-bot:

yc serverless function create --name=for-recognizer-bot

Result:

id: b09bhaokchn9********
folder_id: aoek49ghmknn********
created_at: "2023-03-21T10:03:37.475Z"
name: for-recognizer-bot
log_group_id: eolm8aoq9vcp********
http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9********
status: ACTIVE

Create a version of the for-recognizer-bot function:

yc serverless function version create \
  --function-name for-recognizer-bot \
  --memory=256m \
  --execution-timeout=30s \
  --runtime=python312 \
  --entrypoint=index.handler \
  --service-account-id=<service_account_ID> \
  --environment TELEGRAM_TOKEN=<bot_token> \
  --package-bucket-name=<bucket_name> \
  --package-object-name=index.zip

Where:

--function-name: Name of the function whose version you are creating.
--memory: Amount of RAM.
--execution-timeout: Maximum function runtime before timeout.
--runtime: Runtime environment.
--entrypoint: Entry point.
--service-account-id: recognizer-bot-sa service account ID.
--environment: Environment variables.
--package-bucket-name: Bucket name.
--package-object-name: File key in the index.zip bucket.

Result:

done (1s)
id: d4e6qqlh53nu********
function_id: d4emc80mnp5n********
created_at: "2025-03-22T16:49:41.800Z"
runtime: python312
entrypoint: index.handler
resources:
  memory: "268435456"
execution_timeout: 30s
service_account_id: aje20nhregkc********
image_size: "4096"
status: ACTIVE
tags:
  - $latest
log_group_id: ckgmc3l93cl0********
environment:
  TELEGRAM_TOKEN: <bot_token>
log_options:
  folder_id: b1g86q4m5vej********

With Terraform, you can quickly create a cloud infrastructure in Yandex Cloud and manage it using configuration files. These files store the infrastructure description written in HashiCorp Configuration Language (HCL). If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.

Terraform is distributed under the Business Source License. The Yandex Cloud provider for Terraform is distributed under the MPL-2.0 license.

For more information about the provider resources, see the relevant documentation on the Terraform website or its mirror.

If you do not have Terraform yet, install it and configure the Yandex Cloud provider.

Describe your function parameters in the configuration file:
```
resource "yandex_function" "for-recognizer-bot-function" {
  name               = "for-recognizer-bot"
  user_hash          = "first function"
  runtime            = "python312"
  entrypoint         = "index.handler"
  memory             = "256"
  execution_timeout  = "30"
  service_account_id = "aje20nhregkcvu******"
  environment = {
    TELEGRAM_TOKEN = <bot_token>
  }
  package {
    bucket_name = <bucket_name>
    object_name = "index.zip"
  }
}
```
Where:
- name: Function name.
- user_hash: User-defined string that identifies the function version.
- runtime: Function runtime environment.
- entrypoint: Entry point.
- memory: Amount of memory allocated for the function, in MB.
- execution_timeout: Function runtime timeout.
- service_account_id: recognizer-bot-sa service account ID.
- environment: Environment variables.
- package: Name of the bucket containing your previously uploaded index.zip archive with the function source code.
For more information about yandex_function resource properties, see this provider guide.
Validate your configuration files.
1. In the terminal, navigate to the directory where you created your configuration file.
2. Run a check using the following command:
```
terraform plan
```
If your configuration is correct, the terminal will display a list of the resources to be created and their settings. Otherwise, Terraform will show any detected errors.
Deploy the cloud resources.
1. If the configuration is correct, run this command:
```
terraform apply
```
2. To confirm the function creation, type yes in the terminal and press Enter.

To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.

To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.

Create an API gateway

The Telegram server will notify your bot of new messages via a webhook. The API gateway will receive requests from the bot and forward them to the for-recognizer-bot function for processing.

Management console

CLI

Terraform

API

In the management console, select the folder where you want to create an API gateway.
Go to API Gateway.
Click Create API gateway.
In the Name field, specify recognizer-bot-api-gw.

Under Specification, add the following specification:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Click Create.
Select the previously created API gateway. Save the Default domain value, as you will need it later.

Save the following specification to spec.yaml:

openapi: 3.0.0
info:
  title: Sample API
  version: 1.0.0
paths:
  /for-recognizer-bot-function:
    post:
      x-yc-apigateway-integration:
        type: cloud_functions
        function_id: <function_ID>
        service_account_id: <service_account_ID>
      operationId: for-recognizer-bot-function

Where:

function_id: for-recognizer-bot function ID.
service_account_id: recognizer-bot-sa service account ID.

Run this command:

yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml

Where:

--name: API gateway name.
--spec: Specification file.

Result:

done (5s)
id: d5d1ud9bli1e********
folder_id: b1gc1t4cb638********
created_at: "2023-09-25T16:01:48.926Z"
name: recognizer-bot-api-gw
status: ACTIVE
domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net
log_group_id: ckgefpleo5eg********
connectivity: {}
log_options:
  folder_id: b1gc1t4cb638********

To create an API gateway:

Specify the yandex_api_gateway resource parameters in the configuration file:

resource "yandex_api_gateway" "recognizer-bot-api-gw" {
  name        = "recognizer-bot-api-gw"
  spec = <<-EOT
    openapi: 3.0.0
    info:
      title: Sample API
      version: 1.0.0

    paths:
      /for-recognizer-bot-function:
        post:
          x-yc-apigateway-integration:
            type: cloud_functions
            function_id: <function_ID>
            service_account_id: <service_account_ID>
          operationId: for-recognizer-bot-function
  EOT
}

Where:

name: API gateway name.
spec: API gateway specification.

For more information about Terraform resource parameters, see this provider guide.

Validate your configuration files.
1. In the terminal, navigate to the directory where you created your configuration file.
2. Run a check using the following command:
```
terraform plan
```
If your configuration is correct, the terminal will display a list of the resources to be created and their settings. Otherwise, Terraform will show any detected errors.
Deploy the cloud resources.
1. If the configuration is correct, run this command:
```
terraform apply
```
2. To confirm resource creation, type yes and press Enter.

To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.

Configure a link between the function and the Telegram bot

Set up a webhook for your Telegram bot:

curl --request POST \
  --url 'https://api.telegram.org/bot<bot_token>/setWebhook' \
  --header 'content-type: application/json' \
  --data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'

Where:

<bot_token>: Telegram bot token.
<API_gateway_domain>: recognizer-bot-api-gw API gateway's service domain.

Result:

{"ok":true,"result":true,"description":"Webhook was set"}

Test your bot

Chat with the bot:

Open Telegram and find the bot by its username.

Send /start to the chat.

The bot should respond with:

The bot can do the following:

* Recognize text from images.
* Generate voice messages from text.
* Convert voice messages to text.

Send a text message to the chat. The bot will respond with a voice message generated from your text.
Send a voice message to the chat. The bot will respond with a text message transcribed from your speech.
Send an image containing text to the chat. The bot will respond with a message containing the transcribed text.

Note

The image must meet the following requirements.

How to delete the resources you created

To avoid incurring charges for resources you no longer need, delete them:

Delete the API Gateway.
Delete the function in Cloud Functions.

Creating a Telegram bot for text recognition in images, speech synthesis, and audio recognition

Getting startedGetting started

Required paid resourcesRequired paid resources

Set up the required resourcesSet up the required resources

Register your Telegram botRegister your Telegram bot

Create a functionCreate a function

Create an API gatewayCreate an API gateway

Configure a link between the function and the Telegram botConfigure a link between the function and the Telegram bot

Test your botTest your bot

How to delete the resources you createdHow to delete the resources you created

Was the article helpful?

Getting started

Required paid resources

Set up the required resources

Register your Telegram bot

Create a function

Create an API gateway

Configure a link between the function and the Telegram bot

Test your bot

How to delete the resources you created