SpeechSense

Written by

Updated at June 24, 2026

Architecture
Infrastructure requirements
- GPU
- Resources
Dependencies
Configuration
See also

Yandex SpeechSense is a speech analytics platform within Stackland. It transcribes and analyzes audio recordings of conversations, extracts text, classifies customer interactions, and generates analytical reports.

SpeechSense does not come with the basic Stackland package and requires a separate license.

Architecture

SpeechSense is comprised of three components:

YandexGPT for SpeechSense: Natural language processing and text generation. Used to summarize conversations and classify interactions. Requires GPU resources.
SpeechKit: Speech recognition and synthesis. Converts audio to text. Requires GPU resources.
SpeechSense: Speech analytics, data processing, and web UI. Coordinates YandexGPT and SpeechKit, provides a UI to process the results of analysis.

Infrastructure requirements

GPU

SpeechSense requires NVIDIA® GPU nodes:

YandexGPT Pro: 2 × NVIDIA® H100
SpeechKit STT Backend: 1 × NVIDIA® A100
SpeechKit Embeddings: 1 × NVIDIA® H100

Before installing SpeechSense, enable the NVIDIA® GPU support component.

Resources

TA services operate without a GPU but require significant CPU and RAM resources. We recommend allocating at least 32 vCPUs and 64 GB RAM for TA services.

Dependencies

SpeechSense relies on the following Stackland components:

Managed Service for PostgreSQL: Metadata and state storage.
Managed Service for ClickHouse®: Analytical queries and storage of large-scale data.
Managed Service for Apache Kafka®: Streaming data processing.
Object Storage: Storage of audio files and models.
NVIDIA® GPU support: GPU resource management.
Identity and Access Management: User authentication and authorization.

When SpeechSense is enabled, the controller automatically checks for these dependencies and provisions the required database clusters, Apache Kafka® topics, and certificates.

Configuration

To manage SpeechSense, use the SpeechsenseConfig custom resource.

Here is an example:

apiVersion: stackland.yandex.cloud/v1alpha1
kind: SpeechsenseConfig
metadata:
  name: default
spec:
  enabled: true
  settings:
    s3: # Optional. Add data if you need an external storage
      endpoint: "<object_storage_address>"
      accessKeyID: "<key_ID>"
      secretAccessKey: "<secret_key>"

Where:

enabled: Enables/disables the component.
settings.s3.endpoint: Object Storage address.
settings.s3.accessKeyID: Storage access key ID.
settings.s3.secretAccessKey: Storage secret access key.

SpeechSense

ArchitectureArchitecture

Infrastructure requirementsInfrastructure requirements

GPUGPU

ResourcesResources

DependenciesDependencies

ConfigurationConfiguration

See alsoSee also

Was the article helpful?

Architecture

Infrastructure requirements

GPU

Resources

Dependencies

Configuration

See also