Uploading audio data
Use this guide to upload data for API-based speech analysis to SpeechSense. This example uses the following parameters:
- Audio format: WAV.
- The dialog metadata is stored in
metadata_example.json
.
An IAM token or IAM key is used to authenticate the service account.
If you want to upload the chat text instead of voice call audio, follow this guide.
Getting started
To use the API, you will need Git, Python 3.6 or higher, and the grpcio-tools
package. How to install Python
-
In the management console, create a service account.
-
Add the service account to the namespace with the
Data editor
role. This will authorize the service account to upload data to the connection you created. -
Create an API key or IAM token for the service account to authenticate with the API.
-
Clone the Yandex Cloud API repository
:git clone https://github.com/yandex-cloud/cloudapi
-
Install the
grpcio-tools
package using the pip package manager:pip install grpcio-tools
Uploading data
-
Go to the folder hosting the Yandex Cloud API repository, create a folder named
upload_data
, and generate the client interface code in it. Then open theupload_data
folder:Bashcd <path_to_cloudapi_directory> && \ mkdir upload_data && \ python3 -m grpc_tools.protoc -I . \ --python_out=./upload_data/ \ --grpc_python_out=./upload_data/ \ yandex/cloud/speechsense/v1/* cd upload_data
-
In the
upload_data
folder, create theupload_grpc.py
Python script to upload your data to a SpeechSense connection as a single message:import argparse import json from typing import Dict import grpc import datetime from yandex.cloud.speechsense.v1 import talk_service_pb2 from yandex.cloud.speechsense.v1 import talk_service_pb2_grpc from yandex.cloud.speechsense.v1 import audio_pb2 # For IAM authentication replace the api_key parameter with iam_token def upload_talk(connection_id: int, metadata: Dict[str, str], api_key: str, audio_bytes: bytes): credentials = grpc.ssl_channel_credentials() channel = grpc.secure_channel('api.talk-analytics.yandexcloud.net:443', credentials) talk_service_stub = talk_service_pb2_grpc.TalkServiceStub(channel) # Generating an API request request = talk_service_pb2.UploadTalkRequest( metadata=talk_service_pb2.TalkMetadata( connection_id=str(connection_id), fields=metadata ), # Audio format: WAV audio=audio_pb2.AudioRequest( audio_metadata=audio_pb2.AudioMetadata( container_audio=audio_pb2.ContainerAudio( container_audio_type=audio_pb2.ContainerAudio.ContainerAudioType.CONTAINER_AUDIO_TYPE_WAV ) ), audio_data=audio_pb2.AudioChunk(data=audio_bytes) ) ) # Authentication type: API key response = talk_service_stub.Upload(request, metadata=( ('authorization', f'Api-Key {api_key}'), # For IAM authentication supply header # ('authorization', f'Bearer {iam_token}'), )) # Display dialog ID print(f'Dialog ID: {response.talk_id}') if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('--key', required=True, help='API key or IAM token', type=str) parser.add_argument('--connection-id', required=True, help='Connection ID', type=str) parser.add_argument('--audio-path', required=True, help='Audio file path', type=str) parser.add_argument('--meta-path', required=False, help='JSON with the dialog metadata', type=str, default=None) args = parser.parse_args() # Default values if no metadata specified if args.meta_path is None: now = datetime.datetime.now().isoformat() metadata = { 'operator_name': 'Operator', 'operator_id': '1111', 'client_name': 'Client', 'client_id': '2222', 'date': str(now), 'date_from': '2023-09-13T17:30:00.000', 'date_to': '2023-09-13T17:31:00.000', 'direction_outgoing': 'true', } else: with open(args.meta_path, 'r') as fp: metadata = json.load(fp) with open(args.audio_path, 'rb') as fp: audio_bytes = fp.read() upload_talk(args.connection_id, metadata, args.key, audio_bytes)
-
Specify the service account's API key:
export API_KEY=<service_account_API_key>
If using an IAM token, provide it instead of the API key:
export IAM_TOKEN=<service_account_IAM_token>
-
Run the
upload_grpc.py
script with the following parameters:python3 upload_grpc.py \ --audio-path <audio_file> \ --meta-path <metadata> \ --connection-id <connection_ID> \ --key ${API_KEY}
Where:
--audio-path
: Path to the audio file with the dialog.--meta-path
: Path to the file with the dialog metadata.--connection-id
: ID of the connection you upload the data to.--key
: API key for authentication. If using an IAM token, specify theIAM_TOKEN
environment variable instead ofAPI_KEY
.