Getting started with Yandex SpeechSense
Yandex SpeechSense enables you to analyze your business' communication channels through call recordings and supports integration with your PBX and CRM systems. SpeechSense utilizes Yandex SpeechKit voice technologies to transcribe speech and perform high-quality static analysis of dialogs.
You can upload your audio recordings to SpeechSense or use a demo recording
Getting started
- Go to the management console
and log in to Yandex Cloud or sign up if you are not signed up yet. For information on how to get started with Yandex Cloud, see Getting started with Yandex Cloud. - Accept the user agreement.
- In Yandex Cloud Billing
, make sure you have a billing account linked and that it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account yet, create one. - Make sure that your account has the
speech-sense.spaces.creator
role assigned. - Open the SpeechSense home page
. - Select the organization to work with SpeechSense in or create a new one.
Configure the environment
- Create a space where all your projects will be stored: select Create space, enter a name, add a description (optional), and click Create.
- Link a billing account to the space. This account will be debited for the use of SpeechSense.
Tip
You can only manage a billing account if you have a Yandex account. If you use Yandex Cloud through an identity federation, contact
-
Go to the Connections tab and create a connection using your audio's metadata:
- Enter the Connection name.
- Select Two-channel audio under Data type.
- Set connection parameters using the metadata of your audio recordings:
- Under Agent, specify the number of the track that contains the agent's voice in your audio recordings and define their displayed name (
Agent
by default). - Under Customer, specify the number of the track that contains the customer's voice in your audio recordings and define their displayed name (
Customer
by default). - Under Shared metadata, change the parameter names used in the system, if necessary.
- If your metadata includes additional information you want to save and analyze, add it to the appropriate section. Provide the parameter key (must match the key in the metadata file), specify a type, and enter a display name to use in the system. The supported additional parameters are
Date
,String
,Number
,Logical
, andJSON
.
- Under Agent, specify the number of the track that contains the agent's voice in your audio recordings and define their displayed name (
- Click Create connection.
Example of metadata.json{ "direction_outgoung": "true", "client_id": "456", "client_name": "John Doe", "date": "2023-09-29T09:08:38.958Z", "date_to": "2023-09-29T09:15:07.897Z", "language": "RU", "operator_id": "123", "operator_name": "Jane Smith" }
-
Create a project: on the space page, click Create project, enter a project name, and add a connection to the project. You can select up to two connection metadata-based filtering rules for each connection. With filters, you can ensure that only the dialogs you need are added to the project. Once you are done adding connections and configuring filters, click Create project.
Upload audio data
SpeechSense uses the gRPC API to upload data.
SpeechSense supports the following audio file formats:
- LPCM
:AUDIO_ENCODING_LINEAR16_PCM
- WAV
:CONTAINER_AUDIO_TYPE_WAV
- OggOpus
:CONTAINER_AUDIO_TYPE_OGG_OPUS
- MP3
:CONTAINER_AUDIO_TYPE_MP3
To upload audio data to SpeechSense:
- Create a service account.
- Add the service account to the space with the
speech-sense.data.editor
role. To learn more about the roles available in the service, see Access management in SpeechSense. - Create an API key or IAM token for the service account to authenticate with the API. Learn more about authentication in the SpeechSense API.
- Upload your data using a Python data transfer script. You must provide your entire audio recording as one message.