Example use of streaming recognition with API v2
The example shows how you can recognize speech in LPCM format in real time using the SpeechKit API v2.
The example uses the following parameters:
- Language: Russian.
- Format of the audio stream: LPCM with a sampling rate of 8000 Hz.
- Profanity filter: True.
- Intermediate result filter: True.
- Other parameters are left at their defaults.
To use the API, you need the grpcio-tools
package for Python and grpc
for Node.js.
The Yandex account or federated account are authenticated using an IAM token. If you are using your service account, you do not need to include the folder ID in the request header. Learn more about authentication in the SpeechKit API.
To try the examples in this section:
-
Clone the Yandex Cloud API
repository:git clone https://github.com/yandex-cloud/cloudapi
-
Download a sample
audio file for recognition. -
Create a client application:
Python 3Node.js-
Use the pip
package manager to install thegrpcio-tools
package:pip install grpcio-tools
-
Go to the folder with the Yandex Cloud API
repository, create a folder namedoutput
and generate the client interface code in it:cd cloudapi mkdir output python3 -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/ai/stt/v2/stt_service.proto
This will create the
stt_service_pb2.py
andstt_service_pb2_grpc.py
client interface files and dependency files in theoutput
folder. -
Create a file (e.g.,
test.py
) in theoutput
folder root and add the following code to it:#coding=utf8 import argparse import grpc import yandex.cloud.ai.stt.v2.stt_service_pb2 as stt_service_pb2 import yandex.cloud.ai.stt.v2.stt_service_pb2_grpc as stt_service_pb2_grpc CHUNK_SIZE = 4000 def gen(folder_id, audio_file_name): # Specify recognition settings. specification = stt_service_pb2.RecognitionSpec( language_code='ru-RU', profanity_filter=True, model='general', partial_results=True, audio_encoding='LINEAR16_PCM', sample_rate_hertz=8000 ) streaming_config = stt_service_pb2.RecognitionConfig(specification=specification, folder_id=folder_id) # Send a message with recognition settings. yield stt_service_pb2.StreamingRecognitionRequest(config=streaming_config) # Read the audio file and send its contents in chunks. with open(audio_file_name, 'rb') as f: data = f.read(CHUNK_SIZE) while data != b'': yield stt_service_pb2.StreamingRecognitionRequest(audio_content=data) data = f.read(CHUNK_SIZE) def run(folder_id, iam_token, audio_file_name): # Establish a connection with the server. cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('stt.api.cloud.yandex.net:443', cred) stub = stt_service_pb2_grpc.SttServiceStub(channel) # Send data for recognition. it = stub.StreamingRecognize(gen(folder_id, audio_file_name), metadata=( ('authorization', 'Bearer %s' % iam_token), )) # Process the server responses and output the result to the console. try: for r in it: try: print('Start chunk: ') for alternative in r.chunks[0].alternatives: print('alternative: ', alternative.text) print('Is final: ', r.chunks[0].final) print('') except LookupError: print('Not available chunks') except grpc._channel._Rendezvous as err: print('Error code %s, message: %s' % (err._state.code, err._state.details)) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--token', required=True, help='IAM token') parser.add_argument('--folder_id', required=True, help='folder ID') parser.add_argument('--path', required=True, help='audio file path') args = parser.parse_args() run(args.folder_id, args.token, args.path)
Where:
language_code
: Recognition language.profanity_filter
: Profanity filter.model
: Language model.partial_results
: Filter of intermediate recognition results.audio_encoding
: Audio stream format.sample_rate_hertz
: Audio stream sampling rate.
-
Set the folder ID:
export FOLDER_ID=<folder_ID>
-
Set the IAM token:
export IAM_TOKEN=<IAM_token>
-
Run the created file:
python3 test.py --token ${IAM_TOKEN} --folder_id ${FOLDER_ID} --path speech.pcm
Where
--path
is the path to the audio file for recognition.Result:
Start chunk: alternative: Hello Is final: False Start chunk: alternative: Hello world Is final: True
-
Go to the folder with the Yandex Cloud API
repository, create a folder namedsrc
, and generate a dependency file namedpackage.json
in it:cd cloudapi mkdir src cd src npm init
-
Install the necessary packages using npm:
npm install grpc @grpc/proto-loader google-proto-files --save
-
Download a gRPC public certificate
from the official repository and save it in the root of thesrc
folder. -
Create a file (e.g.,
index.js
) in thesrc
folder root and add the following code to it:const fs = require('fs'); const grpc = require('grpc'); const protoLoader = require('@grpc/proto-loader'); const CHUNK_SIZE = 4000; // Get the folder ID and IAM token from the environment variables. const folderId = process.env.FOLDER_ID; const iamToken = process.env.IAM_TOKEN; // Read the file specified in the arguments. const audio = fs.readFileSync(process.argv[2]); // Set the recognition settings. const request = { config: { specification: { languageCode: 'ru-RU', profanityFilter: true, model: 'general', partialResults: true, audioEncoding: 'LINEAR16_PCM', sampleRateHertz: '8000' }, folderId: folderId } }; // // Set audio send frequency in milliseconds. // For LPCM, you can calculate the frequency using this formula: CHUNK_SIZE * 1000 / ( 2 * sampleRateHertz); const FREQUENCY = 250; const serviceMetadata = new grpc.Metadata(); serviceMetadata.add('authorization', `Bearer ${iamToken}`); const packageDefinition = protoLoader.loadSync('../yandex/cloud/ai/stt/v2/stt_service.proto', { includeDirs: ['node_modules/google-proto-files', '..'] }); const packageObject = grpc.loadPackageDefinition(packageDefinition); // Establish a connection with the server. const serviceConstructor = packageObject.yandex.cloud.ai.stt.v2.SttService; const grpcCredentials = grpc.credentials.createSsl(fs.readFileSync('./roots.pem')); const service = new serviceConstructor('stt.api.cloud.yandex.net:443', grpcCredentials); const call = service['StreamingRecognize'](serviceMetadata); // Send a message with recognition settings. call.write(request); // Read the audio file and send its contents in chunks. let i = 1; const interval = setInterval(() => { if (i * CHUNK_SIZE <= audio.length) { const chunk = new Uint16Array(audio.slice((i - 1) * CHUNK_SIZE, i * CHUNK_SIZE)); const chunkBuffer = Buffer.from(chunk); call.write({audioContent: chunkBuffer}); i++; } else { call.end(); clearInterval(interval); } }, FREQUENCY); // Process the server responses and output the result to the console. call.on('data', (response) => { console.log('Start chunk: '); response.chunks[0].alternatives.forEach((alternative) => { console.log('alternative: ', alternative.text) }); console.log('Is final: ', Boolean(response.chunks[0].final)); console.log(''); }); call.on('error', (response) => { // Output errors to the console. console.log(response); });
Where:
languageCode
: Recognition language.pprofanityFilter
: Profanity filter.model
: Language model.partialResults
: Filter of intermediate recognition results.audioEncoding
: Audio stream format.sampleRateHertz
: Audio stream sampling rate.
-
Set the folder ID:
export FOLDER_ID=<folder_ID>
-
Set the IAM token:
export IAM_TOKEN=<IAM_token>
-
Run the created file:
node index.js speech.pcm
Where
speech.pcm
is the name of the audio file for recognition.Result:
Start chunk: alternative: Hello world Is final: true
-