Synchronous speech recognition using the Python SDK
Below, we provide an example of synchronous speech recognition from an audio file using the SpeechKit Python SDK. This example uses the following parameters:
- Recognition model:
General
. - Language: Russian.
To use the Python SDK, the yandex-speechkit
package is required.
Authentication is performed under a service account using an API key or IAM token. Learn more about authentication in the SpeechKit API.
Getting started
- Create a service account and assign it the
ai.speechkit-stt.user
role. - Get an API key for the service account and save it.
- Download a sample
audio file for recognition or generate your own one.
Create an application for synchronous speech recognition
-
Install the
yandex-speechkit
package using the pip package manager:pip install yandex-speechkit
The installation was tested on Python 3.9. For the minimum allowed Python version, see the SDK website
.If a
grpcio-tools
package version conflict occurs, see Resolving version conflicts during the installation of Python SDK. -
Create a file named
test.py
and add the following code to it:from argparse import ArgumentParser from speechkit import model_repository, configure_credentials, creds from speechkit.stt import AudioProcessingType # Authentication via an API key. configure_credentials( yandex_credentials=creds.YandexCredentials( api_key='<API key>' ) ) def recognize(audio): model = model_repository.recognition_model() # Set the recognition settings. model.model = 'general' model.language = 'ru-RU' model.audio_processing_type = AudioProcessingType.Full # Recognition of speech in the specified audio file and output of results to the console. result = model.transcribe_file(audio) for c, res in enumerate(result): print('=' * 80) print(f'channel: {c}\n\nraw_text:\n{res.raw_text}\n\nnorm_text:\n{res.normalized_text}\n') if res.has_utterances(): print('utterances:') for utterance in res.utterances: print(utterance) if __name__ == '__main__': parser = ArgumentParser() parser.add_argument('--audio', type=str, help='audio path', required=True) args = parser.parse_args() recognize(args.audio)
Where:
-
api_key
: API key for the service account. -
audio
: Path to the file for audio recording. -
model
: Recognition model. -
language
: Recognition language. -
audio_processing_type
: Audio processing method.The Python SDK does not support streaming and asynchronous recognition, but you can simulate these features. To do this, set the following value in the
test.py
file, theaudio_processing_type
parameter:AudioProcessingType.Stream
for streaming recognition.AudioProcessingType.Full
for asynchronous recognition.
-
-
Run the created file:
python3 test.py --audio speech.pcm
Where
--audio
is the path to the audio file to transcribe.The result contains recognized speech:
channel: 0 raw_text: i'm yandex speechkit i can turn any text into speech now you can too norm_text: I'm Yandex SpeechKit. I can turn any text into speech. Now you can, too! utterances: I'm Yandex SpeechKit. I can turn any text into speech. Now you can, too! [0.419, 6.379]