Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • Speech recognition using Playground
    • Speech synthesis using Playground
      • Speech synthesis in the API v3
      • Speech synthesis in REST API v3
      • Pattern-based speech synthesis
      • Brand Voice Call Center pattern-based speech synthesis
      • Speech synthesis in WAV format, API v1
      • Speech synthesis in OggOpus format, API v1
      • Speech synthesis from SSML text, API v1
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  1. Step-by-step guides
  2. Speech synthesis
  3. Speech synthesis in the API v3

Speech synthesis in the API v3

Written by
Yandex Cloud
Updated at March 31, 2025

With the SpeechKit API v3, you can synthesize speech from text in TTS markup to a WAV file.

The example uses the following synthesis parameters:

  • Synthesized audio file format: LPCM with a sample rate of 22050 Hz, WAV container (default).
  • Volume normalization: LUFS (default).

To convert and record the result, you will need the grpcio-tools and pydub packages and the FFmpeg utility.

Authentication is performed under a service account using an API key or IAM token. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.

To implement an example:

  1. Create a service account to work with the SpeechKit API.

  2. Assign the service account the ai.speechkit-tts.user role or higher for the folder where it was created.

  3. Get an API key or IAM token for your service account.

  4. Create a client application:

    Python 3
    Java
    1. Clone the Yandex Cloud API repository:

      git clone https://github.com/yandex-cloud/cloudapi
      
    2. Install the grpcio-tools and pydub packages using the pip package manager:

      pip install grpcio-tools && \
      pip install pydub
      

      You need the grpcio-tools package to generate the interface code for the API v3 synthesis client. You need the pydub package to process the resulting audio files.

    3. Download FFmpeg for correct operation of the pydub package. Add the path to the folder with the executable to the PATH variable. To do this, run the following command:

      export PATH=$PATH:<path_to_folder_with_FFmpeg_executable>
      
    4. Go to the folder hosting the cloned Yandex Cloud API repository, create a folder named output, and generate the client interface code there:

      cd <path_to_cloudapi_folder>
      mkdir output
      python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
        --python_out=output \
        --grpc_python_out=output \
        google/api/http.proto \
        google/api/annotations.proto \
        yandex/cloud/api/operation.proto \
        google/rpc/status.proto \
        yandex/cloud/operation/operation.proto \
        yandex/cloud/validation.proto \
        yandex/cloud/ai/tts/v3/tts_service.proto \
        yandex/cloud/ai/tts/v3/tts.proto
      

      This will create the tts_pb2.py, tts_pb2_grpc.py, tts_service_pb2.py, and tts_service_pb2_grpc.py client interface files, as well as dependency files, in the output folder.

    5. Create a file (e.g., test.py) in the output folder root and add the following code to it:

      import io
      import grpc
      import pydub
      import argparse
      
      import yandex.cloud.ai.tts.v3.tts_pb2 as tts_pb2
      import yandex.cloud.ai.tts.v3.tts_service_pb2_grpc as tts_service_pb2_grpc
      
      # Specify the synthesis settings.
      # When authorizing with an API key, provide api_key instead of iam_token.
      #def synthesize(api_key, text) -> pydub.AudioSegment:
      def synthesize(iam_token, text) -> pydub.AudioSegment:
          request = tts_pb2.UtteranceSynthesisRequest(
              text=text,
              output_audio_spec=tts_pb2.AudioFormatOptions(
                  container_audio=tts_pb2.ContainerAudio(
                      container_audio_type=tts_pb2.ContainerAudio.WAV
                  )
              ),
              # Synthesis parameters
              hints=[
                tts_pb2.Hints(voice= 'alexander'), # (Optional) Specify the voice. The default value is `marina`
                tts_pb2.Hints(role = 'good'), # (Optional) Specify the role only if applicable for this voice
                tts_pb2.Hints(speed=1.1), # (Optional) Specify synthesis speed
              ],
      
              loudness_normalization_type=tts_pb2.UtteranceSynthesisRequest.LUFS
          )
      
          # Establish a connection with the server.
          cred = grpc.ssl_channel_credentials()
          channel = grpc.secure_channel('tts.api.cloud.yandex.net:443', cred)
          stub = tts_service_pb2_grpc.SynthesizerStub(channel)
      
          # Send data for synthesis.
          it = stub.UtteranceSynthesis(request, metadata=(
      
          # Parameters for authentication with an IAM token
              ('authorization', f'Bearer {iam_token}'),
          # Parameters for authentication with an API key as a service account
          #   ('authorization', f'Api-Key {api_key}'),
          ))
      
          # Create an audio file out of chunks.
          try:
              audio = io.BytesIO()
              for response in it:
                  audio.write(response.audio_chunk.data)
              audio.seek(0)
              return pydub.AudioSegment.from_wav(audio)
          except grpc._channel._Rendezvous as err:
              print(f'Error code {err._state.code}, message: {err._state.details}')
              raise err
      
      
      if __name__ == '__main__':
          parser = argparse.ArgumentParser()
          parser.add_argument('--token', required=True, help='IAM token or API key')
          parser.add_argument('--text', required=True, help='Text for synthesis')
          parser.add_argument('--output', required=True, help='Output file')
          args = parser.parse_args()
      
          audio = synthesize(args.token, args.text)
          with open(args.output, 'wb') as fp:
              audio.export(fp, format='wav')
      
    6. Execute the file from the previous step:

      export IAM_TOKEN=<service_account_IAM_token>
      export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!'
      python3 output/test.py \
        --token ${IAM_TOKEN} \
        --output speech.wav \
        --text ${TEXT}
      

      Where:

      • IAM_TOKEN: Service account IAM token. If you use an API key for authentication under a service account, change the Python script and the program call.
      • TEXT: Text for synthesis in TTS markup.
      • --output: Name of the file for the audio.

      As a result, a file named speech.wav with synthesized speech will be created in the cloudapi folder.

    1. Install the dependencies:

      sudo apt update && sudo apt install --yes default-jdk maven
      
    2. Clone the repository with a Java application configuration:

      git clone https://github.com/yandex-cloud-examples/yc-speechkit-tts-java
      
    3. Go to the repository directory:

      cd yc-speechkit-tts-java
      
    4. Compile a project in this directory:

      mvn clean install
      
    5. Go to the target directory you created:

      cd target
      
    6. Specify the service account's API key and text to synthesize:

      export API_KEY=<API_key> && \
      export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!'
      
    7. Run the Java script for speech synthesis:

      java -cp speechkit_examples-1.0-SNAPSHOT.jar yandex.cloud.speechkit.examples.TtsV3Client ${TEXT}
      

      As a result, the result.wav audio file should appear in the target directory. It contains speech recorded from the TEXT environment variable.

See alsoSee also

  • Learn more about the API v3
  • Authentication with the SpeechKit API
  • Speech synthesis using the Python SDK

Was the article helpful?

Previous
Regular asynchronous recognition of audio files, API v2
Next
Speech synthesis in REST API v3
© 2025 Direct Cursus Technology L.L.C.