Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Tutorials
    • All tutorials
      • Recognizing text in image archives in Vision OCR
      • Developing a Telegram bot for text and audio recognition
      • Regular asynchronous recognition of audio files from Object Storage

In this article:

  • Getting started
  • Required paid resources
  • Prepare the resources
  • Register your Telegram bot
  • Create a function
  • Create an API gateway
  • Configure a link between the function and the Telegram bot
  • Test the bot
  • How to delete the resources you created
  1. Machine learning and artificial intelligence
  2. Image, text, and speech recognition
  3. Developing a Telegram bot for text and audio recognition

Developing a Telegram bot for text recognition in images, audio synthesis and recognition

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
    • Required paid resources
  • Prepare the resources
  • Register your Telegram bot
  • Create a function
  • Create an API gateway
  • Configure a link between the function and the Telegram bot
  • Test the bot
  • How to delete the resources you created

In this tutorial, you will create a bot for Telegram that can:

  • Synthesize speech from a message text using the Yandex SpeechKit API v1.
  • Recognize speech in voice messages and convert it into text using the Yandex SpeechKit synchronous recognition API.
  • Recognize text in images using Yandex Vision OCR.

The Yandex API Gateway API gateway will receive requests from the bot and forward them to the Yandex Cloud Functions function for processing.

To create a bot:

  1. Get your cloud ready.
  2. Prepare the resources.
  3. Register your Telegram bot.
  4. Create a function.
  5. Create an API gateway.
  6. Link the function and the bot.
  7. Test the bot.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resourcesRequired paid resources

The cost of Telegram bot support includes:

  • Fee for using SpeechKit (see SpeechKit pricing).
  • Fee for using Vision OCR (see Vision OCR pricing).
  • Fee for the number of function calls, computing resources allocated to executing the function, and outgoing traffic (see Cloud Functions pricing).
  • Fee for the number of requests to the API gateway and outgoing traffic (see API Gateway pricing).

Prepare the resourcesPrepare the resources

  1. Create a service account named recognizer-bot-sa and assign it the ai.editor and functions.editor roles for your folder.

  2. Prepare a ZIP archive with the function code:

    1. Create a file named index.py and paste the code below to it.

      index.py
      import logging
      import requests
      import telebot
      import json
      import os
      import base64
      
      # Service endpoints and authentication data
      
      API_TOKEN = os.environ['TELEGRAM_TOKEN']
      vision_url = 'https://ocr.api.cloud.yandex.net/ocr/v1/recognizeText'
      speechkit_url = 'https://stt.api.cloud.yandex.net/speech/v1/stt:recognize'
      speechkit_synthesis_url = 'https://tts.api.cloud.yandex.net/speech/v1/tts:synthesize'
      folder_id = ""
      iam_token = ''
      
      logger = telebot.logger
      telebot.logger.setLevel(logging.INFO)
      bot = telebot.TeleBot(API_TOKEN, threaded=False)
      
      # Getting the folder ID
      
      def get_folder_id(iam_token, version_id):
      
          headers = {'Authorization': f'Bearer {iam_token}'}
          function_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/versions/{version_id}',
                                         headers=headers)
          function_id_data = function_id_req.json()
          function_id = function_id_data['functionId']
          folder_id_req = requests.get(f'https://serverless-functions.api.cloud.yandex.net/functions/v1/functions/{function_id}',
                                       headers=headers)
          folder_id_data = folder_id_req.json()
          folder_id = folder_id_data['folderId']
          return folder_id
      
      def process_event(event):
      
          request_body_dict = json.loads(event['body'])
          update = telebot.types.Update.de_json(request_body_dict)
      
          bot.process_new_updates([update])
      
      def handler(event, context):
          global iam_token, folder_id
          iam_token = context.token["access_token"]
          version_id = context.function_version
          folder_id = get_folder_id(iam_token, version_id)
          process_event(event)
          return {
              'statusCode': 200
          }
      
      # Command and message listeners
      
      @bot.message_handler(commands=['help', 'start'])
      def send_welcome(message):
          bot.reply_to(message,
                       "The bot can do the following:\n*Recognize text from images.\n* Generate voice messages from text.\n* Convert voice messages to text.")
      
      @bot.message_handler(func=lambda message: True, content_types=['text'])
      def echo_message(message):
          global iam_token, folder_id
          with open('/tmp/audio.ogg', "wb") as f:
              for audio_content in synthesize(folder_id, iam_token, message.text):
                  f.write(audio_content)
          voice = open('/tmp/audio.ogg', 'rb')
          bot.send_voice(message.chat.id, voice)
      
      @bot.message_handler(func=lambda message: True, content_types=['voice'])
      def echo_audio(message):
          file_id = message.voice.file_id
          file_info = bot.get_file(file_id)
          downloaded_file = bot.download_file(file_info.file_path)
          response_text = audio_analyze(speechkit_url, iam_token, folder_id, downloaded_file)
          bot.reply_to(message, response_text)
      
      @bot.message_handler(func=lambda message: True, content_types=['photo'])
      def echo_photo(message):
          file_id = message.photo[-1].file_id
          file_info = bot.get_file(file_id)
          downloaded_file = bot.download_file(file_info.file_path)
          image_data = base64.b64encode(downloaded_file).decode('utf-8')
          response_text = image_analyze(vision_url, iam_token, folder_id, image_data)
          bot.reply_to(message, response_text)
      
      # Image recognition
      
      def image_analyze(vision_url, iam_token, folder_id, image_data):
          response = requests.post(vision_url, headers={'Authorization': 'Bearer '+iam_token, 'x-folder-id': folder_id}, json={
              "mimeType": "image",
              "languageCodes": ["en", "ru"],
              "model": "page",
              "content": image_data
              })
          blocks = response.json()['result']['textAnnotation']['blocks']
          text = ''
          for block in blocks:
              for line in block['lines']:
                  for word in line['words']:
                      text += word['text'] + ' '
                  text += '\n'
          return text
      
      # Speech recognition
      
      def audio_analyze(speechkit_url, iam_token, folder_id, audio_data):
          headers = {'Authorization': f'Bearer {iam_token}'}
          params = {
              "topic": "general",
              "folderId": f"{folder_id}",
              "lang": "ru-RU"}
      
          audio_request = requests.post(speechkit_url, params=params, headers=headers, data=audio_data)
          responseData = audio_request.json()
          response = 'error'
          if responseData.get("error_code") is None:
              response = (responseData.get("result"))
          return response
      
      # Speech synthesis
      
      def synthesize(folder_id, iam_token, text):
         headers = {
             'Authorization': 'Bearer ' + iam_token,
         }
      
         data = {
             'text': text,
             'lang': 'ru-RU',
             'voice': 'filipp',
             'folderId': folder_id
         }
      
         with requests.post(speechkit_synthesis_url, headers=headers, data=data, stream=True) as resp:
             if resp.status_code != 200:
                 raise RuntimeError("Invalid response received: code: %d, message: %s" % (resp.status_code, resp.text))
      
             for chunk in resp.iter_content(chunk_size=None):
                 yield chunk
      
    2. Create a file named requirements.txt and specify in it the library to work with the bot.

      telebot
      
    3. Add both files to the index.zip archive.

Register your Telegram botRegister your Telegram bot

Register your bot in Telegram and get a token.

  1. Run BotFather and send to it the following command:

    /newbot
    
  2. In the name field, enter a name for the new bot. This is the name the bot users will see.

  3. In the username field, enter a username for the new bot. You can use it to locate the bot in Telegram. The username must end with ...Bot or ..._bot.

    As a result, you will get a token. Save it. You will need it later.

Create a functionCreate a function

Create a function to process user actions in the chat.

Management console
Yandex Cloud CLI
Terraform
API
  1. In the management console, select the folder where you want to create a function.

  2. In the list of services, select Cloud Functions.

  3. Create a function:

    1. Click Create function.
    2. Enter the function name: for-recognizer-bot.
    3. Click Create.
  4. Create a function version:

    1. Select the Python runtime environment, disable the Add files with code examples option, and click Continue.

    2. Specify the ZIP archive upload method and select the index.zip archive prepared earlier.

    3. Specify the entry point: index.handler.

    4. Under Parameters, specify:

      • Timeout: 30

      • Memory: 128 MB

      • Service account: recognizer-bot-sa

      • Environment variables:

        • TELEGRAM_TOKEN: Your Telegram bot token.
    5. Click Save changes.

  1. Create a function named for-recognizer-bot:

    yc serverless function create --name=for-recognizer-bot
    

    Result:

    id: b09bhaokchn9********
    folder_id: aoek49ghmknn********
    created_at: "2023-03-21T10:03:37.475Z"
    name: for-recognizer-bot
    log_group_id: eolm8aoq9vcp********
    http_invoke_url: https://functions.yandexcloud.net/b09bhaokchn9********
    status: ACTIVE
    
  2. Create a version of the for-recognizer-bot function:

    yc serverless function version create \
      --function-name for-recognizer-bot \
      --memory=128m \
      --execution-timeout=30s \
      --runtime=python312 \
      --entrypoint=index.handler \
      --service-account-id=<service_account_ID> \
      --environment TELEGRAM_TOKEN=<bot_token> \
      --source-path=./index.zip
    

    Where:

    • --function-name: Name of the function whose version you are creating.
    • --memory: Amount of RAM.
    • --execution-timeout: Maximum running time of the function until timeout.
    • --runtime: Runtime environment.
    • --entrypoint: Entry point.
    • --service-account-id: recognizer-bot-sa service account ID.
    • --environment: Environment variables.
    • --source-path: Path to the index.zip archive.

    Result:

    done (1s)
    id: d4e6qqlh53nu********
    function_id: d4emc80mnp5n********
    created_at: "2023-03-22T16:49:41.800Z"
    runtime: python312
    entrypoint: index.handler
    resources:
      memory: "134217728"
    execution_timeout: 30s
    service_account_id: aje20nhregkc********
    image_size: "4096"
    status: ACTIVE
    tags:
      - $latest
    log_group_id: ckgmc3l93cl0********
    environment:
      TELEGRAM_TOKEN: <bot_token>
    log_options:
      folder_id: b1g86q4m5vej********
    
  1. In the configuration file, describe the function parameters:

    resource "yandex_function" "for-recognizer-bot-function" {
      name               = "for-recognizer-bot"
      user_hash          = "first function"
      runtime            = "python312"
      entrypoint         = "index.handler"
      memory             = "128"
      execution_timeout  = "30"
      service_account_id = "aje20nhregkcvu******"
      environment = {
        TELEGRAM_TOKEN = <bot_token>
      }
      content {
        zip_filename = "./index.zip"
      }
    }
    

    Where:

    • name: Function name.
    • user_hash: Random string to identify the function version.
    • runtime: Function runtime environment.
    • entrypoint: Entry point.
    • memory: Amount of memory allocated for the function, in MB.
    • execution_timeout: Function execution timeout.
    • service_account_id: recognizer-bot-sa service account ID.
    • environment: Environment variables.
    • content: Path to the index.zip archive with the function source code.

    For more information about the yandex_function resource parameters, see the relevant provider documentation.

  2. Make sure the configuration files are correct.

    1. In the command line, go to the directory where you created the configuration file.

    2. Run a check using this command:

      terraform plan
      

    If you described the configuration correctly, the terminal will display a list of resources being created and their parameters. If the configuration contains any errors, Terraform will point them out.

  3. Deploy the cloud resources.

    1. If the configuration does not contain any errors, run this command:

      terraform apply
      
    2. Confirm creating the function: type yes in the terminal and press Enter.

To create a function, use the create REST API method for the Function resource or the FunctionService/Create gRPC API call.

To create a function version, use the createVersion REST API method for the Function resource or the FunctionService/CreateVersion gRPC API call.

Create an API gatewayCreate an API gateway

The Telegram server will notify your bot of new messages using a webhook. The API gateway will accept requests on the bot side and redirect them to the for-recognizer-bot function for processing.

Management console
CLI
Terraform
API
  1. In the management console, select the folder where you want to create an API gateway.

  2. In the list of services, select API Gateway.

  3. Click Create API gateway.

  4. In the Name field, enter recognizer-bot-api-gw.

  5. In the Specification section, add the specification:

    openapi: 3.0.0
    info:
      title: Sample API
      version: 1.0.0
    paths:
      /for-recognizer-bot-function:
        post:
          x-yc-apigateway-integration:
            type: cloud_functions
            function_id: <function_ID>
            service_account_id: <service_account_ID>
          operationId: for-recognizer-bot-function
    

    Where:

    • function_id: for-recognizer-bot function ID.
    • service_account_id: recognizer-bot-sa service account ID.
  6. Click Create.

  7. Select the created API gateway. Save the Default domain field value. You will need it later.

  1. Save the following specification to the spec.yaml file:

    openapi: 3.0.0
    info:
      title: Sample API
      version: 1.0.0
    paths:
      /for-recognizer-bot-function:
        post:
          x-yc-apigateway-integration:
            type: cloud_functions
            function_id: <function_ID>
            service_account_id: <service_account_ID>
          operationId: for-recognizer-bot-function
    

    Where:

    • function_id: for-recognizer-bot function ID.
    • service_account_id: recognizer-bot-sa service account ID.
  2. Run this command:

    yc serverless api-gateway create --name recognizer-bot-api-gw --spec=spec.yaml
    

    Where:

    • --name: API gateway name.
    • --spec: Specification file.

    Result:

    done (5s)
    id: d5d1ud9bli1e********
    folder_id: b1gc1t4cb638********
    created_at: "2023-09-25T16:01:48.926Z"
    name: recognizer-bot-api-gw
    status: ACTIVE
    domain: d5dm1lba80md********.i9******.apigw.yandexcloud.net
    log_group_id: ckgefpleo5eg********
    connectivity: {}
    log_options:
      folder_id: b1gc1t4cb638********
    

To create an API gateway:

  1. Describe the parameters of the yandex_api_gateway resource in the configuration file:

    resource "yandex_api_gateway" "recognizer-bot-api-gw" {
      name        = "recognizer-bot-api-gw"
      spec = <<-EOT
        openapi: 3.0.0
        info:
          title: Sample API
          version: 1.0.0
    
        paths:
          /for-recognizer-bot-function:
            post:
              x-yc-apigateway-integration:
                type: cloud_functions
                function_id: <function_ID>
                service_account_id: <service_account_ID>
              operationId: for-recognizer-bot-function
      EOT
    }
    

    Where:

    • name: API gateway name.
    • spec: API gateway specification.

    For more information about the resource parameters in Terraform, see the relevant provider documentation.

  2. Make sure the configuration files are correct.

    1. In the command line, go to the directory where you created the configuration file.

    2. Run a check using this command:

      terraform plan
      

    If you described the configuration correctly, the terminal will display a list of resources being created and their parameters. If the configuration contains any errors, Terraform will point them out.

  3. Deploy the cloud resources.

    1. If the configuration does not contain any errors, run this command:

      terraform apply
      
    2. Confirm resource creation by typing yes in the terminal and pressing Enter.

To create an API gateway, use the create REST API method for the ApiGateway resource or the ApiGatewayService/Create gRPC API call.

Configure a link between the function and the Telegram botConfigure a link between the function and the Telegram bot

Install a webhook for your Telegram bot:

curl --request POST \
  --url https://api.telegram.org/bot<bot_token>/setWebhook \
  --header 'content-type: application/json' \ 
  --data '{"url": "<API_gateway_domain>/for-recognizer-bot-function"}'

Where:

  • <bot_token>: Telegram bot token.
  • <API_gateway_domain>: recognizer-bot-api-gw API gateway's service domain.

Result:

{"ok":true,"result":true,"description":"Webhook was set"}

Test the botTest the bot

Talk to the bot:

  1. Open Telegram and search for the bot by the specified username.

  2. Send the /start message to the chat.

    The bot must respond with:

    The bot can do the following:
    
    * Recognize text from images.
    * Generate voice messages from text.
    * Convert voice messages to text.
    
  3. Send a text message to the chat. The bot will respond with a voice message synthesized from your text.

  4. Send a voice message to the chat. The bot will respond with a message containing the text recognized from your speech.

  5. Send an image with text to the chat. The bot will respond with a message containing the recognized text.

    Note

    The image must meet the requirements.

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the API Gateway API gateway.
  • Delete the Cloud Functions function.

Was the article helpful?

Previous
Recognizing text in image archives in Vision OCR
Next
Regular asynchronous recognition of audio files from Object Storage
Yandex project
© 2025 Yandex.Cloud LLC