Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Audio file streaming recognition, API v3
      • Microphone speech streaming recognition, API v3
      • Automatic language detection, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v1
      • Asynchronous recognition of WAV audio files, API v3
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
      • Regular asynchronous recognition of audio files, API v2
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • Automatic language detection
  • Prepare the required resources
  • Create an application for streaming speech recognition
  1. Step-by-step guides
  2. Recognition
  3. Automatic language detection, API v3

Streaming speech recognition with auto language detection in the API v3

Written by
Yandex Cloud
Improved by
Dmitry A.
Updated at April 30, 2025
  • Automatic language detection
  • Prepare the required resources
  • Create an application for streaming speech recognition

The example shows how you can recognize speech in LPCM format in real time using the SpeechKit API v3 with auto language detection.

The example uses the following parameters:

  • Recognition language: auto (automatic language detection).
  • Format of the audio stream: LPCM with a sampling rate of 8000 Hz.
  • Number of audio channels: 1 (default).
  • Other parameters are left at their defaults.

Automatic language detectionAutomatic language detection

SpeechKit automatically detects language in each sentence during speech recognition.

To configure automatic language detection, set the language_code parameter of the LanguageRestrictionOptions() method to auto:

Python 3
language_restriction=stt_pb2.LanguageRestrictionOptions(
      restriction_type=stt_pb2.LanguageRestrictionOptions.WHITELIST,
      language_code=['auto']
)

Along with recognition results, the service returns language labels containing the language code and probability of its correct detection:

language_code: "ru-RU" probability: 0.91582357883453369

If a sentence contains words in different languages, the language may be detected incorrectly. To improve results, provide a list of expected languages as a clue for the model. Here is an example:

Python 3
...
      language_code=['auto', 'en-US', 'es-ES', 'fr-FR']
...

Note

Language detection and setting language labels are only available in gRPC API v3.

Examples

Text in audio Transcript
Xiaomi is a Chinese brand shumi is a chinese brand
Привет is hi in Russian privet is hi in russian
Men koʻchada sayr qilishni va muzqaymoq isteʼmol qilishni yaxshi koʻraman, I like to take a walk outside and have some ice cream Men koʻchada sayr qilishni va muzqaymoq isteʼmol qilishni yaxshi koʻraman, I like to take a walk outside and have some ice cream

Prepare the required resourcesPrepare the required resources

  1. Create a service account and assign the ai.speechkit-stt.user role to it.
  2. Get an IAM token for the service account and save it.
  3. Download a sample audio file for recognition or generate your own one.

Create an application for streaming speech recognitionCreate an application for streaming speech recognition

To implement an example from this section:

  1. Clone the Yandex Cloud API repository:

    git clone https://github.com/yandex-cloud/cloudapi
    
  2. Create a client application:

    Python 3
    1. Use the pip package manager to install the grpcio-tools package:

      pip install grpcio-tools
      
    2. Go to the directory hosting the cloned Yandex Cloud API repository, create a directory named output, and generate the client interface code there:

      cd <path_to_cloudapi_directory>
      mkdir output
      python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
          --python_out=output \
          --grpc_python_out=output \
          google/api/http.proto \
          google/api/annotations.proto \
          yandex/cloud/api/operation.proto \
          google/rpc/status.proto \
          yandex/cloud/operation/operation.proto \
          yandex/cloud/validation.proto \
          yandex/cloud/ai/stt/v3/stt_service.proto \
          yandex/cloud/ai/stt/v3/stt.proto
      

      This will create the stt_pb2.py, stt_pb2_grpc.py, stt_service_pb2.py, and stt_service_pb2_grpc.py client interface files, as well as dependency files, in the output directory.

    3. Create a file (e.g., test.py) in the root of the output directory, and add the following code to it:

      #coding=utf8
      import argparse
      
      import grpc
      
      import yandex.cloud.ai.stt.v3.stt_pb2 as stt_pb2
      import yandex.cloud.ai.stt.v3.stt_service_pb2_grpc as stt_service_pb2_grpc
      
      CHUNK_SIZE = 4000
      
      def gen(audio_file_name):
          # Specify recognition settings.
          recognize_options = stt_pb2.StreamingOptions(
              recognition_model=stt_pb2.RecognitionModelOptions(
                  audio_format=stt_pb2.AudioFormatOptions(
                      raw_audio=stt_pb2.RawAudio(
                          audio_encoding=stt_pb2.RawAudio.LINEAR16_PCM,
                          sample_rate_hertz=8000,
                          audio_channel_count=1
                      )
                  ),
                  # Specify automatic language detection.
                  language_restriction=stt_pb2.LanguageRestrictionOptions(
                      restriction_type=stt_pb2.LanguageRestrictionOptions.WHITELIST,
                      language_code=['auto']
                  ),
                  # Select the streaming recognition model.
                  audio_processing_type=stt_pb2.RecognitionModelOptions.REAL_TIME
              )
          )
      
          # Send a message with recognition settings.
          yield stt_pb2.StreamingRequest(session_options=recognize_options)
      
          # Read the audio file and send its contents in chunks.
          with open(audio_file_name, 'rb') as f:
              data = f.read(CHUNK_SIZE)
              while data != b'':
                  yield stt_pb2.StreamingRequest(chunk=stt_pb2.AudioChunk(data=data))
                  data = f.read(CHUNK_SIZE)
      
      # Provide api_key instead of iam_token when authenticating with an API key
      # as a service account.
      # def run(api_key, audio_file_name):
      def run(iam_token, audio_file_name):
          # Establish a connection with the server.
          cred = grpc.ssl_channel_credentials()
          channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred)
          stub = stt_service_pb2_grpc.RecognizerStub(channel)
      
          # Send data for recognition.
          it = stub.RecognizeStreaming(gen(audio_file_name), metadata=(
          # Parameters for authentication with an IAM token
              ('authorization', f'Bearer {iam_token}'),
          # Parameters for authentication with an API key as a service account
          #   ('authorization', f'Api-Key {api_key}'),
          ))
      
          # Process the server responses and output the result to the console.
          try:
              for r in it:
                  event_type, alternatives = r.WhichOneof('Event'), None
                  if event_type == 'partial' and len(r.partial.alternatives) > 0:
                      alternatives = [a.text for a in r.partial.alternatives]
                  if event_type == 'final':
                      alternatives = [a.text for a in r.final.alternatives]
                      # Getting language labels:
                      langs = [a.languages for a in r.final.alternatives]
                  if event_type == 'final_refinement':
                      alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives]
                  print(f'type={event_type}, alternatives={alternatives}')
                  # Printing language labels to the console for final versions:
                  if event_type == 'final':
                      print(f'Language labels:')
                      for lang in langs:
                          for line in lang:
                              words=f'{line}'.splitlines()
                              for word in words:
                                  print(f'  {word}', end="")
                              print()
          except grpc._channel._Rendezvous as err:
              print(f'Error code {err._state.code}, message: {err._state.details}')
              raise err
      
      if __name__ == '__main__':
          parser = argparse.ArgumentParser()
          parser.add_argument('--token', required=True, help='IAM token or API key')
          parser.add_argument('--path', required=True, help='audio file path')
          args = parser.parse_args()
          run(args.token, args.path)
      

      Where:

      • audio_encoding: Audio stream format.
      • sample_rate_hertz: Audio stream sampling rate.
      • audio_channel_count: Number of audio channels.
      • language_code: Recognition language.
  3. Use the IAM token of the service account:

    export IAM_TOKEN=<service_account_IAM_token>
    
  4. Run the created file:

    python3 output/test.py --token ${IAM_TOKEN} --path <path_to_speech.pcm_file>
    

    Where --path is the path to the audio file for recognition.

    Result:

    type=status_code, alternatives=None
    type=partial, alternatives=None
    type=partial, alternatives=['hello']
    type=final, alternatives=['hello world']
    Language guess:
        language_code: "ru-RU"  probability: 1
    type=final_refinement, alternatives=['hello world']
    type=eou_update, alternatives=None
    type=partial, alternatives=None
    type=status_code, alternatives=None
    

See alsoSee also

  • API v3 reference
  • Authentication with the SpeechKit API
  • Supported languages and recognition models

Was the article helpful?

Previous
Microphone speech streaming recognition, API v3
Next
Streaming recognition, API v2
© 2025 Direct Cursus Technology L.L.C.