Getting started with SpeechKit
You can test speech recognition and synthesis on the SpeechKit demo page. For information on pricing, see SpeechKit pricing policy.
Getting started
- Go to the management console
and log in to Yandex Cloud or sign up if not signed up yet. For information on how to get started with Yandex Cloud, see Getting started with Yandex Cloud. - Accept the user agreement.
- In Yandex Cloud Billing
, make sure you have a billing account linked and its status isACTIVE
orTRIAL_ACTIVE
. If you do not have a billing account yet, create one.
Speech recognition using Playground
To recognize speech from an audiofile via the SpeechKit Playground interface:
- Open the management console
and select SpeechKit. - In the left-hand panel, click
SpeechKit Playground and go to the Speech recognition tab. - In the Language field, select the language you need or leave
Automatic
. - Click Select file or drag the file to the loading area.
- Click Start recognition to start the speech recognition process.
SpeechKit Playground features basic speech recognition options. For more flexible recognition settings, use the API.
Speech synthesis using Playground
To convert text to audio via the SpeechKit Playground interface:
- Open the management console
and select SpeechKit. - In the left-hand panel, click
SpeechKit Playground and go to the Speech synthesis tab. - Under Synthesis settings:
- Pauses: Select the length of pauses between words or specify it yourself.
- Emphasize word: Emphasize the essential words.
- Stress: Mark the stressed vowels to clarify the correct pronunciation of the words.
- Phonemes: Monitor the correct pronunciation of words using phonemes.
- Under Synthesis settings:
- Language: Select the speaker's language.
- Voice: Specify the speaker's voice.
- Role: Select the speaker's role.
- Speech speed: Set the speaker's speech rate.
- Voice pitch: Adjust the speaker's voice pitch.
- Audio format: Select the audio format.
- To synthesize the text, click Synthesize and playback.
- To download the result, click
.
SpeechKit Playground features basic speech synthesis options. For more flexible synthesis settings, use the API.
Authentication for API access
To work with the SpeechKit API, you need to pass authentication. The authentication method depends on the account type:
- Get an IAM token for your Yandex account or federated account.
- Get the ID of the folder for which your account has the
ai.speechkit-stt.user
,ai.speechkit-tts.user
, or higher roles. -
When accessing SpeechKit via the API, provide the received parameters in each request:
-
For API v1 and API v2:
Specify the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM token>
Specify the folder ID in the request body in the
folderId
parameter. -
For API v3:
- Specify the IAM token in the
Authorization
header. - Specify the folder ID in the
x-folder-id
header.
Authorization: Bearer <IAM_token> x-folder-id <folder_ID>
- Specify the IAM token in the
-
SpeechKit supports two authentication methods based on service accounts:
-
With an IAM token:
-
Get an IAM token.
-
Provide the IAM token in the
Authorization
header in the following format:Authorization: Bearer <IAM_token>
-
-
With API keys.
Use API keys if requesting an IAM token automatically is not an option.
-
Provide the API key in the
Authorization
header in the following format:Authorization: Api-Key <API_key>
Do not specify the folder ID in your requests, as YandexGPT uses the folder in which the service account was created.
Speech recognition via the API
Learn how to recognize short and long pre-recorded audio files in SpeechKit. The service also supports voice recognition in real time.
Speech synthesis via the API
Learn how to convert text to audio using the SpeechKit API v1 and API v3. The API v3 provides more flexibility for speech synthesis setup. For more information about the differences between the API versions, see Synthesis options.