Speech synthesis in the API v3
With the SpeechKit API v3, you can synthesize speech from text in TTS markup to a WAV
The example uses the following synthesis parameters:
- Synthesized audio file format: LPCM with a sample rate of 22050 Hz, WAV
container (default). - Volume normalization: LUFS (default).
To convert and record the result, you will need the grpcio-tools
and pydub
packages and the FFmpeg
Authentication is performed under a service account using an API key or IAM token. To learn more about SpeechKit API authentication, see Authentication with the SpeechKit API.
To implement an example:
-
Create a service account to work with the SpeechKit API.
-
Assign the service account the
ai.speechkit-tts.user
role or higher for the folder where it was created. -
Create a client application:
Python 3Java-
Clone the Yandex Cloud API
repository:git clone https://github.com/yandex-cloud/cloudapi
-
Install the
grpcio-tools
andpydub
packages using the pip package manager:pip install grpcio-tools && \ pip install pydub
You need the
grpcio-tools
package to generate the interface code for the API v3 synthesis client. You need thepydub
package to process the resulting audio files. -
Download
FFmpeg for correct operation of thepydub
package. Add the path to the folder with the executable to thePATH
variable. To do this, run the following command:export PATH=$PATH:<path_to_folder_with_FFmpeg_executable>
-
Go to the folder hosting the cloned Yandex Cloud API repository, create a folder named
output
, and generate the client interface code there:cd <path_to_cloudapi_folder> mkdir output python3 -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/validation.proto \ yandex/cloud/ai/tts/v3/tts_service.proto \ yandex/cloud/ai/tts/v3/tts.proto
This will create the
tts_pb2.py
,tts_pb2_grpc.py
,tts_service_pb2.py
, andtts_service_pb2_grpc.py
client interface files, as well as dependency files, in theoutput
folder. -
Create a file (e.g.,
test.py
) in theoutput
folder root and add the following code to it:import io import grpc import pydub import argparse import yandex.cloud.ai.tts.v3.tts_pb2 as tts_pb2 import yandex.cloud.ai.tts.v3.tts_service_pb2_grpc as tts_service_pb2_grpc # Specify the synthesis settings. # When authorizing with an API key, provide api_key instead of iam_token. #def synthesize(api_key, text) -> pydub.AudioSegment: def synthesize(iam_token, text) -> pydub.AudioSegment: request = tts_pb2.UtteranceSynthesisRequest( text=text, output_audio_spec=tts_pb2.AudioFormatOptions( container_audio=tts_pb2.ContainerAudio( container_audio_type=tts_pb2.ContainerAudio.WAV ) ), # Synthesis parameters hints=[ tts_pb2.Hints(voice= 'alexander'), # (Optional) Specify the voice. The default value is `marina` tts_pb2.Hints(role = 'good'), # (Optional) Specify the role only if applicable for this voice tts_pb2.Hints(speed=1.1), # (Optional) Specify synthesis speed ], loudness_normalization_type=tts_pb2.UtteranceSynthesisRequest.LUFS ) # Establish a connection with the server. cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('tts.api.cloud.yandex.net:443', cred) stub = tts_service_pb2_grpc.SynthesizerStub(channel) # Send data for synthesis. it = stub.UtteranceSynthesis(request, metadata=( # Parameters for authentication with an IAM token ('authorization', f'Bearer {iam_token}'), # Parameters for authentication with an API key as a service account # ('authorization', f'Api-Key {api_key}'), )) # Create an audio file out of chunks. try: audio = io.BytesIO() for response in it: audio.write(response.audio_chunk.data) audio.seek(0) return pydub.AudioSegment.from_wav(audio) except grpc._channel._Rendezvous as err: print(f'Error code {err._state.code}, message: {err._state.details}') raise err if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--token', required=True, help='IAM token or API key') parser.add_argument('--text', required=True, help='Text for synthesis') parser.add_argument('--output', required=True, help='Output file') args = parser.parse_args() audio = synthesize(args.token, args.text) with open(args.output, 'wb') as fp: audio.export(fp, format='wav')
-
Execute the file from the previous step:
export IAM_TOKEN=<service_account_IAM_token> export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!' python3 output/test.py \ --token ${IAM_TOKEN} \ --output speech.wav \ --text ${TEXT}
Where:
IAM_TOKEN
: Service account IAM token. If you use an API key for authentication under a service account, change the Python script and the program call.TEXT
: Text for synthesis in TTS markup.--output
: Name of the file for the audio.
As a result, a file named
speech.wav
with synthesized speech will be created in thecloudapi
folder.
-
Install the dependencies:
sudo apt update && sudo apt install --yes default-jdk maven
-
Clone the repository
with a Java application configuration:git clone https://github.com/yandex-cloud-examples/yc-speechkit-tts-java
-
Go to the repository directory:
cd yc-speechkit-tts-java
-
Compile a project in this directory:
mvn clean install
-
Go to the
target
directory you created:cd target
-
Specify the service account's API key and text to synthesize:
export API_KEY=<API_key> && \ export TEXT='I'm Yandex Speech+Kit. I can turn any text into speech. Now y+ou can, too!'
-
Run the Java script for speech synthesis:
java -cp speechkit_examples-1.0-SNAPSHOT.jar yandex.cloud.speechkit.examples.TtsV3Client ${TEXT}
As a result, the
result.wav
audio file should appear in thetarget
directory. It contains speech recorded from theTEXT
environment variable.
-