Audio file streaming recognition using the API v3
Below, we provide an example of streaming recognition of speech from an audio file using the SpeechKit API v3. This example uses the following parameters:
- Language: Russian.
- Format of the audio stream: LPCM with a sampling rate of 8000 Hz.
- Number of audio channels: 1 (default).
- Profanity filter enabled.
- Other parameters left by default.
Authentication is performed under a service account using an API key or IAM token. Learn more about authentication in the SpeechKit API.
To implement an example from this section:
-
Create a service account to work with the SpeechKit API.
-
Assign the
ai.speechkit-stt.user
role or higher to the service account, which will allow it to work with SpeechKit in the folder it was created in. -
Download the sample
audio file or use your own. -
Create a client application:
Python 3Java-
Clone the Yandex Cloud API
repository:git clone https://github.com/yandex-cloud/cloudapi
-
Install the
grpcio-tools
package using the pip package manager:
pip install grpcio-tools
-
Go to the folder hosting the cloned Yandex Cloud API repository, create a folder named
output
, and generate the client interface code there:cd <path_to_cloudapi_folder> mkdir output python3 -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/validation.proto \ yandex/cloud/ai/stt/v3/stt_service.proto \ yandex/cloud/ai/stt/v3/stt.proto
As a result, the
stt_pb2.py
,stt_pb2_grpc.py
,stt_service_pb2.py
, andstt_service_pb2_grpc.py
client interface files as well as dependency files will be created in theoutput
directory. -
In the root of the
output
directory, create a file, e.g.,test.py
, and add to it the following code:#coding=utf8 import argparse import grpc import yandex.cloud.ai.stt.v3.stt_pb2 as stt_pb2 import yandex.cloud.ai.stt.v3.stt_service_pb2_grpc as stt_service_pb2_grpc CHUNK_SIZE = 4000 def gen(audio_file_name): # Specify the recognition settings. recognize_options = stt_pb2.StreamingOptions( recognition_model=stt_pb2.RecognitionModelOptions( audio_format=stt_pb2.AudioFormatOptions( raw_audio=stt_pb2.RawAudio( audio_encoding=stt_pb2.RawAudio.LINEAR16_PCM, sample_rate_hertz=8000, audio_channel_count=1 ) ), text_normalization=stt_pb2.TextNormalizationOptions( text_normalization=stt_pb2.TextNormalizationOptions.TEXT_NORMALIZATION_ENABLED, profanity_filter=True, literature_text=False ), language_restriction=stt_pb2.LanguageRestrictionOptions( restriction_type=stt_pb2.LanguageRestrictionOptions.WHITELIST, language_code=['ru-RU'] ), audio_processing_type=stt_pb2.RecognitionModelOptions.REAL_TIME ) ) # Send a message with recognition settings. yield stt_pb2.StreamingRequest(session_options=recognize_options) # Read the audio file and send its contents in chunks. with open(audio_file_name, 'rb') as f: data = f.read(CHUNK_SIZE) while data != b'': yield stt_pb2.StreamingRequest(chunk=stt_pb2.AudioChunk(data=data)) data = f.read(CHUNK_SIZE) # When authorizing with an API key # as a service account, provide api_key instead of iam_token. # def run(api_key, audio_file_name): def run(iam_token, audio_file_name): # Establish a server connection. cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred) stub = stt_service_pb2_grpc.RecognizerStub(channel) # Send data for recognition. it = stub.RecognizeStreaming(gen(audio_file_name), metadata=( # Parameters for authorization with an IAM token ('authorization', f'Bearer {iam_token}'), # Parameters for authorization as a service account with an API key # ('authorization', f'Api-Key {api_key}'), )) # Process the server responses and output the result to the console. try: for r in it: event_type, alternatives = r.WhichOneof('Event'), None if event_type == 'partial' and len(r.partial.alternatives) > 0: alternatives = [a.text for a in r.partial.alternatives] if event_type == 'final': alternatives = [a.text for a in r.final.alternatives] if event_type == 'final_refinement': alternatives = [a.text for a in r.final_refinement.normalized_text.alternatives] print(f'type={event_type}, alternatives={alternatives}') except grpc._channel._Rendezvous as err: print(f'Error code {err._state.code}, message: {err._state.details}') raise err if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--token', required=True, help='IAM token or API key') parser.add_argument('--path', required=True, help='audio file path') args = parser.parse_args() run(args.token, args.path)
Where:
audio_encoding
: Format of the audio stream.sample_rate_hertz
: Sampling rate of the audio stream.audio_channel_count
: Number of audio channels.profanity_filter
: Profanity filter.literature_text
: Flag to generate the recognized text in a literary style.language_code
: Recognition language.
-
Use the IAM token of the service account:
export IAM_TOKEN=<service_account_IAM_token>
-
Run the created file:
python3 output/test.py --token ${IAM_TOKEN} --path <path_to_speech.pcm>
Where
path
is the path to the audio file to recognize:Result:
type=status_code, alternatives=None type=partial, alternatives=None type=partial, alternatives=['hello world'] type=final, alternatives=['hello world'] type=final_refinement, alternatives=['Hello world'] type=eou_update, alternatives=None type=partial, alternatives=None type=status_code, alternatives=None
-
Install the dependencies:
sudo apt update && sudo apt install --yes default-jdk maven
-
Clone the repository
with a Java application configuration:git clone https://github.com/yandex-cloud-examples/yc-speechkit-stt-java
-
Go to the repository directory:
cd yc-speechkit-stt-java
-
Download a sample
audio file in WAV format . Save the audio file to the directory with the repository. -
Compile a project in this directory:
mvn clean install
-
Go to the
target
directory you created:cd target
-
Specify the service account's API key:
export API_KEY=<API_key>
-
Run the Java program for speech recognition:
java -cp speechkit_examples-1.0-SNAPSHOT.jar yandex.cloud.speechkit.examples.SttV3Client <path_to_audio_file>
In the command, specify the absolute path to the sample audio file you downloaded.
Result:
sending initial request Done sending Stt stream completed Recognized text is I'm Yandex SpeechKit. I can turn any text into speech. Now you can, too!
-