Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI Studio
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechSense
  • Getting started
    • All guides
        • Uploading audio data via the management console
        • Uploading audio data via the gRPC API
        • Uploading audio data with splitting via the gRPC API
        • Uploading chat conversations
      • Working with dialogs
      • Viewing related dialogs
      • Working with reports
  • Audit Trails events
  • Access management
  • Pricing policy
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Uploading data
  1. Step-by-step guides
  2. Operations with data
  3. Uploading data
  4. Uploading audio data with splitting via the gRPC API

Uploading audio data with splitting via the gRPC API

Written by
Yandex Cloud
Updated at June 20, 2025
  • Getting started
  • Uploading data

Use this guide to upload data to SpeechSense for speech recognition and analysis via API. This example uses the following parameters:

  • Audio format: WAV.
  • The dialog metadata is stored in metadata.json.

An IAM token or IAM key is used to authenticate the service account.

You can discover SpeechSense features using a quick audio data upload via the management console.

If you want to upload the chat text instead of voice call audio, follow this guide.

Getting startedGetting started

To use the Yandex Cloud API, you will need Git, Python 3.6 or higher, and the grpcio-tools package. Learn how to install Python.

To prepare for uploading audio recordings:

  1. Create a connection of the Single-channel audio type with additional dialog splitting settings.

    If you want to upload linked dialogs, add the ticket_id string key to the connection's general metadata. The dialogs will be linked by this key.

  2. Create a project with the new connection.

    This project and connection is where the audio recordings of dialogs will be uploaded.

  3. In the management console, create a service account.

  4. Add the service account to the namespace with the Data editor role. This will allow the service account to upload data to SpeechSense.

  5. To authenticate to the Yandex Cloud API, create an API key or IAM token for the service account.

  6. Clone the Yandex Cloud API repository:

    git clone https://github.com/yandex-cloud/cloudapi
    
  7. Install the grpcio-tools package using the pip package manager:

    pip install grpcio-tools
    

Uploading dataUploading data

Note

The dates are ISO 8601 UTC with zero time offset. For Moscow time, add +03:00 instead of Z at the end of the line: 2025-04-24T14:34:19+03:00.

Maximum audio duration is 4 hours.

  1. Go to the folder hosting the Yandex Cloud API repository, create a folder named upload_data, and generate the client interface code in it. Then open the upload_data folder:

    Bash
    cd <path_to_cloudapi_directory> && \
    mkdir upload_data && \
    python3 -m grpc_tools.protoc -I . -I third_party/googleapis \
         --python_out=./upload_data/ \
         --grpc_python_out=./upload_data/ \
         google/api/http.proto \
         google/api/annotations.proto \
         yandex/cloud/api/operation.proto \
         google/rpc/status.proto \
         yandex/cloud/operation/operation.proto \
         yandex/cloud/validation.proto \
         yandex/cloud/speechsense/v1/*.proto \
         yandex/cloud/speechsense/v1/*/*.proto
    cd upload_data
    
  2. In the upload_data folder, create the upload_grpc.py Python script to upload your data to a SpeechSense connection. The file will be transferred via chunks:

    import argparse
    import json
    from typing import Dict
    
    import grpc
    
    from yandex.cloud.speechsense.v1 import talk_service_pb2
    from yandex.cloud.speechsense.v1 import talk_service_pb2_grpc
    from yandex.cloud.speechsense.v1 import audio_pb2
    
    
    # Configure the chunk size
    CHUNK_SIZE_BYTES = 1 * 1024 * 1024
    
    
    def upload_audio_requests_iterator(connection_id: str, metadata: Dict[str, str], audio_path: str):
        # Transfer the general dialog metadata
        yield talk_service_pb2.StreamTalkRequest(
            metadata=talk_service_pb2.TalkMetadata(
                connection_id=connection_id,
                fields=metadata
            )
        )
        # Transfer the audio metadata
        yield talk_service_pb2.StreamTalkRequest(
            audio=audio_pb2.AudioStreamingRequest(
                audio_metadata=audio_pb2.AudioMetadata(
                    container_audio=audio_pb2.ContainerAudio.ContainerAudioType.CONTAINER_AUDIO_TYPE_WAV
                )
            )
        )
        with open(audio_path, mode='rb') as fp:
            data = fp.read(CHUNK_SIZE_BYTES)
            while len(data) > 0:
                # Transfer the audio file's next byte chunk
                yield talk_service_pb2.StreamTalkRequest(
                    audio=audio_pb2.AudioStreamingRequest(
                        chunk=audio_pb2.AudioChunk(data=data)
                    )
                )
                data = fp.read(CHUNK_SIZE_BYTES)
    
    
    def upload_talk(endpoint: str, connection_id: str, metadata: Dict[str, str], token: str, audio_path: str):
        # Establish a connection with the server
        credentials = grpc.ssl_channel_credentials()
        channel = grpc.secure_channel(endpoint, credentials)
        talk_service_stub = talk_service_pb2_grpc.TalkServiceStub(channel)
    
        # Transfer a request iterator and get a response from the server
        response = talk_service_stub.UploadBadge(
            upload_audio_requests_iterator(connection_id, metadata, audio_path, audio_type),
            metadata=(('authorization', token),)
        )
    
        print(f'Talk id: {response.talk_id}')
    
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser()
    
        parser.add_argument('--endpoint', required=False, help='API Endpoint', type=str, default='api.speechsense.yandexcloud.net:443')
        parser.add_argument('--token', required=True, help='IAM token', type=str)
        parser.add_argument('--token-type', required=False, help='Token type', choices=['iam-token', 'api-key'], default='iam-token', type=str)
        parser.add_argument('--connection-id', required=True, help='Connection Id', type=str)
        parser.add_argument('--audio-path', required=True, help='Audio file path', type=str)
        parser.add_argument('--meta-path', required=False, help='Talk metadata json', type=str, default=None)
    args = parser.parse_args()
    
        required_keys = [
            "operator_name",
            "operator_id",
            "date"
        ]
        with open(args.meta_path, 'r') as fp:
            metadata = json.load(fp)
        for required_key in required_keys:
            if required_key not in metadata:
                raise ValueError(f"Metadata doesn't contain one of the reqiured keys: {required_key}.")
    
        if args.token_type == 'iam-token':
            token = f'Bearer {args.token}'
        elif args.token_type == 'api-key':
            token = f'Api-Key {args.token}'
    
        if args.audio_type is None:
            file_extension = args.audio_path.split('.')[-1]
            if file_extension not in ['wav', 'ogg', 'mp3']:
                raise ValueError(f"Unknown file extension: {file_extension}. Specify the --audio-type argument.")
            audio_type = file_extension
        else:
            audio_type = args.audio_type
    
        upload_talk(args.endpoint, args.connection_id, metadata, token, args.audio_path, audio_type)
    
  3. In the upload_data directory, create a file named metadata.json with your conversation metadata:

    {
        "operator_name": "<agent_name>",
        "operator_id": "<agent_ID>",
        "date": "<start_date>",
        "language": "<language>",
        <additional_connection_parameters>
    }
    

    The file's fields must match the parameters of the connection you are uploading audio recordings to. The template above shows the required fields for Single-channel audio type connections. If you added other parameters to the connection, specify them in the metadata.json file; e.g., to upload linked dialogs, add the following parameter to your file:

    {
       ...
       "ticket_id": "<task_number>"
    }
    
  4. Specify the service account's API key:

    export API_KEY=<service_account_API_key>
    

    If using an IAM token, provide it instead of the API key:

    export IAM_TOKEN=<service_account_IAM_token>
    
  5. Run the upload_grpc.py script with the following parameters:

    python3 upload_grpc.py \
       --audio-path <audio_file> \
       --meta-path <metadata> \
       --connection-id <connection_ID> \
       --key ${API_KEY}
    

    Where:

    • --audio-path: Path to the file with the dialog audio.
    • --meta-path: Path to the file with the dialog metadata.
    • --connection-id: ID of the connection you upload the data to.
    • --key: API key for authentication. If using an IAM token, specify the IAM_TOKEN environment variable instead of API_KEY.

Was the article helpful?

Previous
Uploading audio data via the gRPC API
Next
Uploading chat conversations
© 2025 Direct Cursus Technology L.L.C.