Example use of streaming recognition with API v2
The example shows how you can recognize speech in LPCM format in real time using the SpeechKit API v2.
The example uses the following parameters:
- Language: Russian.
- Format of the audio stream: LPCM with a sampling rate of 8000 Hz.
- Profanity filter: True.
- Intermediate result filter: True.
- Other parameters left by default.
To use the API, you need the grpcio-tools
package for Python and grpc
for Node.js.
The Yandex account or federated account are authenticated using an IAM token. If you are using your service account, you do not need to include the folder ID in the request header. Learn more about authentication in the SpeechKit API.
To try the examples in this section:
-
Clone the Yandex Cloud API
repository:git clone https://github.com/yandex-cloud/cloudapi
-
Download a sample
audio file for recognition. -
Create a client application:
Python 3Node.js-
Install the
grpcio-tools
package using the pip package manager:pip install grpcio-tools
-
Go to the directory hosting the Yandex Cloud API
repository, create anoutput
directory, and generate the client interface code there:cd cloudapi mkdir output python3 -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/ai/stt/v2/stt_service.proto
As a result, the
stt_service_pb2.py
andstt_service_pb2_grpc.py
client interface files as well as dependency files will be created in theoutput
directory. -
In the root of the
output
folder, create a file, e.g.,test.py
, and add to it the following code:#coding=utf8 import argparse import grpc import yandex.cloud.ai.stt.v2.stt_service_pb2 as stt_service_pb2 import yandex.cloud.ai.stt.v2.stt_service_pb2_grpc as stt_service_pb2_grpc CHUNK_SIZE = 4000 def gen(folder_id, audio_file_name): # Specify the recognition settings. specification = stt_service_pb2.RecognitionSpec( language_code='ru-RU', profanity_filter=True, model='general', partial_results=True, audio_encoding='LINEAR16_PCM', sample_rate_hertz=8000 ) streaming_config = stt_service_pb2.RecognitionConfig(specification=specification, folder_id=folder_id) # Send a message with the recognition settings. yield stt_service_pb2.StreamingRecognitionRequest(config=streaming_config) # Read the audio file and send its contents in chunks. with open(audio_file_name, 'rb') as f: data = f.read(CHUNK_SIZE) while data != b'': yield stt_service_pb2.StreamingRecognitionRequest(audio_content=data) data = f.read(CHUNK_SIZE) def run(folder_id, iam_token, audio_file_name): # Connect to the server. cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred) stub = stt_service_pb2_grpc.SttServiceStub(channel) # Send data for recognition. it = stub.StreamingRecognize(gen(folder_id, audio_file_name), metadata=( ('authorization', 'Bearer %s' % iam_token), )) # Process the server responses and output the result to the console. try: for r in it: try: print('Start chunk: ') for alternative in r.chunks[0].alternatives: print('alternative: ', alternative.text) print('Is final: ', r.chunks[0].final) print('') except LookupError: print('Not available chunks') except grpc._channel._Rendezvous as err: print('Error code %s, message: %s' % (err._state.code, err._state.details)) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--token', required=True, help='IAM token') parser.add_argument('--folder_id', required=True, help='folder ID') parser.add_argument('--path', required=True, help='audio file path') args = parser.parse_args() run(args.folder_id, args.token, args.path)
Where:
language_code
: Recognition language.profanity_filter
: Profanity filter.model
: Language model.partial_results
: Filter of intermediate recognition results.audio_encoding
: Format of the audio stream.sample_rate_hertz
: Sampling rate of the audio stream.
-
Set the folder ID:
export FOLDER_ID=<folder_ID>
-
Set the IAM token:
export IAM_TOKEN=<IAM_token>
-
Run the created file:
python3 test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --path speech.pcm
Where
path
is the path to the audio file to recognize:Result:
Start chunk: alternative: hello Is final: False Start chunk: alternative: hello world Is final: True
-
Go to the directory with the Yandex Cloud API
repository, create a directory namedsrc
, and generate a dependency file namedpackage.json
in it:cd cloudapi mkdir src cd src npm init
-
Install the necessary packages using npm:
npm install grpc @grpc/proto-loader google-proto-files --save
-
Download a gRPC public key certificate
from the official repository and save it in the root of thesrc
directory. -
In the root of the
src
directory, create a file, e.g.,index.js
, and add to it the following code:const fs = require('fs'); const grpc = require('grpc'); const protoLoader = require('@grpc/proto-loader'); const CHUNK_SIZE = 4000; // Get the folder ID and IAM token from the environment variables. const folderId = process.env.FOLDER_ID; const iamToken = process.env.IAM_TOKEN; // Read the file specified in the arguments. const audio = fs.readFileSync(process.argv[2]); // Specify the recognition settings. const request = { config: { specification: { languageCode: 'ru-RU', profanityFilter: true, model: 'general', partialResults: true, audioEncoding: 'LINEAR16_PCM', sampleRateHertz: '8000' }, folderId: folderId } }; // Set audio send frequency in milliseconds. // To calculate the frequency for the LPCM format, use this formula: CHUNK_SIZE * 1000 / ( 2 * sampleRateHertz). const FREQUENCY = 250; const serviceMetadata = new grpc.Metadata(); serviceMetadata.add('authorization', `Bearer ${iamToken}`); const packageDefinition = protoLoader.loadSync('../yandex/cloud/ai/stt/v2/stt_service.proto', { includeDirs: ['node_modules/google-proto-files', '..'] }); const packageObject = grpc.loadPackageDefinition(packageDefinition); // Connect to the server. const serviceConstructor = packageObject.yandex.cloud.ai.stt.v2.SttService; const grpcCredentials = grpc.credentials.createSsl(fs.readFileSync('./roots.pem')); const service = new serviceConstructor('stt.api.cloud.yandex.net:443', grpcCredentials); const call = service['StreamingRecognize'](serviceMetadata); // Send a message with recognition settings. call.write(request); // Read the audio file and send its contents in chunks. let i = 1; const interval = setInterval(() => { if (i * CHUNK_SIZE <= audio.length) { const chunk = new Uint16Array(audio.slice((i - 1) * CHUNK_SIZE, i * CHUNK_SIZE)); const chunkBuffer = Buffer.from(chunk); call.write({audioContent: chunkBuffer}); i++; } else { call.end(); clearInterval(interval); } }, FREQUENCY); // Process the server responses and output the result to the console. call.on('data', (response) => { console.log('Start chunk: '); response.chunks[0].alternatives.forEach((alternative) => { console.log('alternative: ', alternative.text) }); console.log('Is final: ', Boolean(response.chunks[0].final)); console.log(''); }); call.on('error', (response) => { // Output errors to the console. console.log(response); });
Where:
languageCode
: Recognition language.profanityFilter
: Profanity filter.model
: Language model.partialResults
: Filter of intermediate recognition results.audioEncoding
: Format of the audio stream.sampleRateHertz
: Sampling rate of the audio stream.
-
Set the folder ID:
export FOLDER_ID=<folder_ID>
-
Set the IAM token:
export IAM_TOKEN=<IAM_token>
-
Run the created file:
node index.js speech.pcm
Where
speech.pcm
is the name of an audio file for recognition.Result:
Start chunk: alternative: hello world Is final: true
-