Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Audio file streaming recognition, API v3
      • Microphone speech streaming recognition, API v3
      • Automatic language detection, API v3
      • Streaming recognition, API v2
      • Synchronous recognition, API v1
      • Asynchronous recognition of WAV audio files, API v3
      • Asynchronous recognition of LPCM format, API v2
      • Asynchronous recognition of OggOpus format, API v2
      • Regular asynchronous recognition of audio files, API v2
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. Step-by-step guides
  2. Recognition
  3. Audio file streaming recognition, API v3

Audio file streaming recognition using the API v3

Written by
Yandex Cloud
Updated at April 30, 2025

Below, we provide an example of streaming recognition of speech from an audio file using the SpeechKit API v3. This example uses the following parameters:

  • Language: Russian.
  • Format of the audio stream: LPCM with a sampling rate of 8000 Hz.
  • Number of audio channels: 1 (default).
  • Profanity filter enabled.
  • Other parameters are left at their defaults.

Authentication is performed under a service account using an API key or IAM token. Learn more about authentication in the SpeechKit API.

To implement an example from this section:

  1. Create a service account to work with the SpeechKit API.

  2. Assign the service account the ai.speechkit-stt.user role or higher for the folder where it was created.

  3. Set up an environment and create a client application:

    Python 3
    Java
    1. Get an API key or IAM token for your service account.

    2. Download a sample audio file for recognition or use your own one.

    3. Clone the Yandex Cloud API repository:

      git clone https://github.com/yandex-cloud/cloudapi
      
    4. Use the pip package manager to install the grpcio-tools package:

      pip install grpcio-tools
      
    5. Go to the directory hosting the cloned Yandex Cloud API repository, create a directory named output, and generate the client interface code there:

      cd <path_to_cloudapi_directory>
      mkdir output
      python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
        --python_out=output \
        --grpc_python_out=output \
          google/api/http.proto \
          google/api/annotations.proto \
          yandex/cloud/api/operation.proto \
          google/rpc/status.proto \
          yandex/cloud/operation/operation.proto \
          yandex/cloud/validation.proto \
          yandex/cloud/ai/stt/v3/stt_service.proto \
          yandex/cloud/ai/stt/v3/stt.proto
      

      This will create the stt_pb2.py, stt_pb2_grpc.py, stt_service_pb2.py, and stt_service_pb2_grpc.py client interface files, as well as dependency files, in the output directory.

    6. Create a file (e.g., test.py) in the root of the output directory, and add the following code to it:

      #coding=utf8
      import argparse
      
      import grpc
      
      import yandex.cloud.ai.stt.v3.stt_pb2 as stt_pb2
      import yandex.cloud.ai.stt.v3.stt_service_pb2_grpc as stt_service_pb2_grpc
      
      CHUNK_SIZE = 4000
      
      def gen(audio_file_name):
          # Specify recognition settings.
          recognize_options = stt_pb2.StreamingOptions(
              recognition_model=stt_pb2.RecognitionModelOptions(
                  audio_format=stt_pb2.AudioFormatOptions(
                      raw_audio=stt_pb2.RawAudio(
                          audio_encoding=stt_pb2.RawAudio.LINEAR16_PCM,
                          sample_rate_hertz=8000,
                          audio_channel_count=1
                      )
                  ),
                  text_normalization=stt_pb2.TextNormalizationOptions(
                      text_normalization=stt_pb2.TextNormalizationOptions.TEXT_NORMALIZATION_ENABLED,
                      profanity_filter=True,
                      literature_text=False
                  ),
                  language_restriction=stt_pb2.LanguageRestrictionOptions(
                      restriction_type=stt_pb2.LanguageRestrictionOptions.WHITELIST,
                      language_code=['ru-RU']
                  ),
                  audio_processing_type=stt_pb2.RecognitionModelOptions.REAL_TIME
              )
          )
      
          # Send a message with recognition settings.
          yield stt_pb2.StreamingRequest(session_options=recognize_options)
      
          # Read the audio file and send its contents in chunks.
          with open(audio_file_name, 'rb') as f:
              data = f.read(CHUNK_SIZE)
              while data != b'':
                  yield stt_pb2.StreamingRequest(chunk=stt_pb2.AudioChunk(data=data))
                  data = f.read(CHUNK_SIZE)
      
      # Provide api_key instead of iam_token when authorizing with an API key
      # as a service account.
      # def run(api_key, audio_file_name):
      def run(iam_token, audio_file_name):
          # Establish a connection with the server.
          cred = grpc.ssl_channel_credentials()
          channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred)
          stub = stt_service_pb2_grpc.RecognizerStub(channel)
      
          # Send data for recognition.
          it = stub.RecognizeStreaming(gen(audio_file_name), metadata=(
          # Parameters for authorization with an IAM token
              ('authorization', f'Bearer {iam_token}'),
          # Parameters for authorization with an API key as a service account
          #   ('authorization', f'Api-Key {api_key}'),
          ))
      
          # Process the server responses and output the result to the console.
          try:
              for r in it:
                  event_type, alternatives = r.WhichOneof('Event'), None
                  if event_type == 'partial' and len(r.partial.alternatives) > 0:
                      alternatives = [a.text for a in r.partial.alternatives]
                  if event_type == 'final':
                      alternatives = [a.text for a in r.final.alternatives]
                  if event_type == 'final_refinement':
                      alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives]
                  print(f'type={event_type}, alternatives={alternatives}')
          except grpc._channel._Rendezvous as err:
              print(f'Error code {err._state.code}, message: {err._state.details}')
              raise err
      
      if __name__ == '__main__':
          parser = argparse.ArgumentParser()
          parser.add_argument('--token', required=True, help='IAM token or API key')
          parser.add_argument('--path', required=True, help='audio file path')
          args = parser.parse_args()
          run(args.token, args.path)
      

      Where:

      • audio_encoding: Audio stream format.
      • sample_rate_hertz: Audio stream sampling rate.
      • audio_channel_count: Number of audio channels.
      • profanity_filter: Profanity filter.
      • literature_text: Flag to present the recognized text in a literary style.
      • language_code: Recognition language.
    7. Save the service account's IAM token you got earlier to the IAM_TOKEN environment variable:

      export IAM_TOKEN=<service_account_IAM_token>
      

      To authenticate to the SpeechKit API using an API key, save the API key to the API_KEY environment variable and edit the test.py file as per the comments in the code.

    8. Run the created file:

      python3 output/test.py --token ${IAM_TOKEN} --path <path_to_speech.pcm_file>
      

      Where --path is the path to the audio file for recognition.

      Result:

      type=status_code, alternatives=None
      type=partial, alternatives=None
      type=partial, alternatives=[hello wor]
      type=final, alternatives=[hello world]
      type=final_refinement, alternatives=[Hello world]
      type=eou_update, alternatives=None
      type=partial, alternatives=None
      type=status_code, alternatives=None
      
    1. Create the service account's API key.

    2. Install the dependencies:

      sudo apt update && sudo apt install --yes openjdk-21-jre-headless maven
      

      To assemble the project correctly, you need Java 17 or higher.

    3. Clone the repository with a Java application configuration:

      git clone https://github.com/yandex-cloud-examples/yc-speechkit-stt-java
      
    4. Navigate to the repository directory:

      cd yc-speechkit-stt-java
      
    5. Compile a project in this directory:

      mvn clean install
      
    6. Go to the target directory you created:

      cd target
      
    7. Save the service account's API key to the API_KEY environment variable:

      export API_KEY=<API_key>
      
    8. Download a sample audio file in WAV format:

      wget https://storage.yandexcloud.net/doc-files/speech.wav
      
    9. Run the Java program for speech recognition:

      java -cp speechkit_examples-1.0-SNAPSHOT.jar yandex.cloud.speechkit.examples.SttV3Client speech.wav
      

      Result:

      sending  initial request
      Done sending
      Stt stream completed
      Recognized text is "i'm yandex speechkit i can turn any text into speech now you can too"
      

See alsoSee also

  • Microphone speech streaming recognition using the API v3
  • Learn more about the API v3
  • Authentication with the SpeechKit API

Was the article helpful?

Previous
Speech synthesis using Playground
Next
Microphone speech streaming recognition, API v3
© 2025 Direct Cursus Technology L.L.C.