Uploading audio data via gPRC API

Written by

Updated at May 14, 2025

Getting started
Uploading data

Use this guide to upload data to SpeechSense for speech recognition and analysis via API. This example uses the following parameters:

Audio format: WAV.
The dialog metadata is stored in metadata.json.

An IAM token or IAM key is used to authenticate the service account.

You can discover SpeechSense features using a quick audio data upload via the management console.

If you want to upload the chat text instead of voice call audio, follow this guide.

Getting started

To use the Yandex Cloud API, you will need Git, Python 3.6 or higher, and the grpcio-tools package. Learn how to install Python.

To prepare for uploading audio recordings:

Create a connection of the Two-channel audio type.

If you want to upload linked dialogs, add the ticket_id string key to the connection's general metadata. The dialogs will be linked by this key.
Create a project with the new connection.

Voice call recordings will be uploaded to the project and connection you created.
In the management console, create a service account.
Add the service account to the namespace with the Data editor role. This will allow the service account to upload data to SpeechSense.
To authenticate to the Yandex Cloud API, create an API key or IAM token for the service account.

Clone the Yandex Cloud API repository:

git clone https://github.com/yandex-cloud/cloudapi

Install the grpcio-tools package using the pip package manager:
```
pip install grpcio-tools
```

Uploading data

Note

The dates are ISO 8601 UTC with zero time offset. For Moscow time, add +03:00 instead of Z at the end of the line: 2025-04-24T14:34:19+03:00.

Go to the folder hosting the Yandex Cloud API repository, create a folder named upload_data, and generate the client interface code in it. Then open the upload_data folder:

Bash

cd <path_to_cloudapi_directory> && \
mkdir upload_data && \
python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
     --python_out=./upload_data/ \
     --grpc_python_out=./upload_data/ \
     google/api/http.proto \
     google/api/annotations.proto \
     yandex/cloud/api/operation.proto \
     google/rpc/status.proto \
     yandex/cloud/operation/operation.proto \
     yandex/cloud/validation.proto \
     yandex/cloud/speechsense/v1/*.proto \
     yandex/cloud/speechsense/v1/*/*.proto
cd upload_data

In the upload_data directory , create the upload_grpc.py Python script to upload your data to a SpeechSense connection as a single message:

import argparse
import json
from typing import Dict
import grpc
import datetime

from yandex.cloud.speechsense.v1 import talk_service_pb2
from yandex.cloud.speechsense.v1 import talk_service_pb2_grpc
from yandex.cloud.speechsense.v1 import audio_pb2

# For IAM token authentication, replace the `api_key` parameter with `iam_token`
def upload_talk(connection_id: int, metadata: Dict[str, str], api_key: str, audio_bytes: bytes):
   credentials = grpc.ssl_channel_credentials()
   channel = grpc.secure_channel('api.speechsense.yandexcloud.net:443', credentials)
   talk_service_stub = talk_service_pb2_grpc.TalkServiceStub(channel)

# Forming a request to the API
   request = talk_service_pb2.UploadTalkRequest(
      metadata=talk_service_pb2.TalkMetadata(
         connection_id=str(connection_id),
         fields=metadata
      ),
      # Audio format: WAV
      audio=audio_pb2.AudioRequest(
         audio_metadata=audio_pb2.AudioMetadata(
            container_audio=audio_pb2.ContainerAudio(
               container_audio_type=audio_pb2.ContainerAudio.ContainerAudioType.CONTAINER_AUDIO_TYPE_WAV
            )
         ),
         audio_data=audio_pb2.AudioChunk(data=audio_bytes)
      )
   )
   # Authentication type: API key
   response = talk_service_stub.Upload(request, metadata=(
      ('authorization', f'Api-Key {api_key}'),
   # For IAM token authentication, provide the header
   #  ('authorization', f'Bearer {iam_token}'),
   ))

   # Displaying the dialog ID
   print(f'Dialog ID: {response.talk_id}')

if __name__ == '__main__':
   parser = argparse.ArgumentParser()
   parser.add_argument('--key', required=True, help='API key or IAM token', type=str)
   parser.add_argument('--connection-id', required=True, help='Connection ID', type=str)
   parser.add_argument('--audio-path', required=True, help='Audio file path', type=str)
   parser.add_argument('--meta-path', required=False, help='JSON with the dialog metadata', type=str, default=None)
   args = parser.parse_args()

   # Default values to use if metadata is not defined
   if args.meta_path is None:
      now = datetime.datetime.now().isoformat()
      metadata = {
         'operator_name': 'Operator',
         'operator_id': '1111',
         'client_name': 'Client',
         'client_id': '2222',
         'date': str(now),
         'date_from': '2023-09-13T17:30:00.000Z',
         'date_to': '2023-09-13T17:31:00.000Z',
         'direction_outgoing': 'true',
      }
   else:
      with open(args.meta_path, 'r') as fp:
         metadata = json.load(fp)

   with open(args.audio_path, 'rb') as fp:
      audio_bytes = fp.read()
   upload_talk(args.connection_id, metadata, args.key, audio_bytes)

In the upload_data directory , create a file named metadata.json with your conversation metadata:
```
{
   "operator_name": "<agent_name>",
   "operator_id": "<agent_ID>",
   "client_name": "<customer_name>",
   "client_id": "<customer_ID>",
   "date": "<start_date>",
   "direction_outgoing": "<outgoing_direction:_true_or_false>",
   "language": "<language>",
   <additional_connection_parameters>
}
```
The file's fields must match the parameters of the connection you are uploading audio recordings to. The template above shows the required fields for Two-channel audio type connections. If you added other parameters to the connection, specify them in the metadata.json file; e.g., to upload linked dialogs, add the following parameter to your file:
```
{
   ...
   "ticket_id": "<task_number>"
}
```
Specify the service account's API key:
```
export API_KEY=<service_account_API_key>
```
If using an IAM token, provide it instead of the API key:
```
export IAM_TOKEN=<service_account_IAM_token>
```
Run the upload_grpc.py script with the following parameters:
```
python3 upload_grpc.py \
   --audio-path <audio_file> \
   --meta-path <metadata> \
   --connection-id <connection_ID> \
   --key ${API_KEY}
```
Where:
- --audio-path: Path to the audio file with the dialog.
- --meta-path: Path to the file with the dialog metadata.
- --connection-id: ID of the connection you upload the data to.
- --key: API key for authentication. If using an IAM token, specify the IAM_TOKEN environment variable instead of API_KEY.

Uploading audio data via gPRC API

Getting startedGetting started

Uploading dataUploading data

Was the article helpful?

Getting started

Uploading data