Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • List of voices
      • Overview
      • Data for SpeechKit Brand Voice Lite
      • Data for SpeechKit Brand Voice Call Center
      • Tips for recording roles
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy
  • Audit Trails events

In this article:

  • SpeechKit Brand Voice Premium
  • SpeechKit Brand Voice Lite
  • SpeechKit Brand Voice Call Center
  • Requirements and restrictions SpeechKit Brand Voice Call Center
  1. Speech synthesis
  2. SpeechKit Brand Voice
  3. Overview

Yandex SpeechKit Brand Voice

Written by
Yandex Cloud
Updated at September 18, 2025
  • SpeechKit Brand Voice Premium
  • SpeechKit Brand Voice Lite
  • SpeechKit Brand Voice Call Center
    • Requirements and restrictions SpeechKit Brand Voice Call Center

The SpeechKit Brand Voice technology allows you to create unique voices for your speech synthesis model. It can synthesize both plain text and pattern-based text. Patterns contain phrases with variables that are replaced with prepared text. To cover different practical uses of trained models, Yandex Cloud offers four branches of SpeechKit Brand Voice.

Brand Voice Premium SpeechKit Brand Voice Lite Brand Voice Call Center
Voice Voice based on artist recordings Voice based on artist recordings Voice copy from the pattern
Usage Full-text synthesis. Pattern-based synthesis. Full-text synthesis. Pattern-based synthesis are not supported. Pattern-based synthesis. The variable part should not exceed 25% of the pattern. The same restriction applies to the duration of the variable part relative to the duration of the final audio.
Emotions and roles Copying emotions in pattern-based synthesis.
Developing additional roles.
Copying emotions in pattern-based synthesis.
Developing additional roles.
Copying emotions in pattern-based synthesis.
Sampling frequency in source audio recordings 48 kHz 48 kHz 8 kHz or higher.
Sampling frequency in synthesized audio recordings 22 kHz 22 kHz 8 kHz

To create a unique Brand Voice Premium voice for your business, fill out this form.

SpeechKit Brand Voice PremiumSpeechKit Brand Voice Premium

SpeechKit Brand Voice Premium is suitable for any business task:

  • Voice assistants.
  • Call center robot operators.
  • Text-to-speech conversion of any text.

Creating a full-fledged model with a unique voice requires large amounts of audio recordings. Yandex Cloud experts will help you prepare the data for SpeechKit Brand Voice Premium model training, select a studio and an artist for you, and support you at each step of voice creation.

Once created, the SpeechKit Brand Voice Premium voice can be enhanced by various roles:

SpeechKit Brand Voice LiteSpeechKit Brand Voice Lite

Create your unique voice with SpeechKit Brand Voice Lite by uploading a minimum of marked-up audio samples (30 minutes or more). As a result, you will have a fine-tuned model URI that you can access from your applications via API.

The quality of synthesized speech depends directly on the quality of audio recordings used to train the model. When creating a SpeechKit Brand Voice Lite voice, you are in charge of the entire process of training data preparation.

For more information on how to train your own model, see Data for SpeechKit Brand Voice Lite.

SpeechKit Brand Voice Call CenterSpeechKit Brand Voice Call Center

SpeechKit Brand Voice Call Center is purpose-designed for call center automation and other business scenarios involving phone calls:

  • Telemarketing.
  • Managing calls to level 1 technical support.
  • Surveys.
  • Call center automation.

With Brand Voice Call Center, you do not have to train a special model based on your artist's voice, as the voice will be copied directly from the patterns you provide for phrase generation. The speech is synthesized integrally rather than glued together from a pre-recorded pattern and a generated variable part.

You can use SpeechKit Brand Voice Call Center to automate your standard dialogs.

For example, if you have an audio with the phrase Hi Michael, I am calling from Thunderclouds. My name is Anastasia. Is it a good time to talk?, you can transform it to Hi Ann, I am calling from New Doors. My name is Matt. Is it a good time to talk? without having to record any additional phrases.

Requirements and restrictions SpeechKit Brand Voice Call CenterRequirements and restrictions SpeechKit Brand Voice Call Center

For speech synthesis, you need an audio file with your phrase pattern and a text with marked up variables. To learn more about text requirements, see Requirements for synthesized texts.

The sampling frequency in the synthesized audio recording is 8 kHz. This is enough for phone calls. However, in other scenarios, you might hear some noises and flaws of synthesis.

SpeechKit Brand Voice Call Center is designed for phone calls. The texts for synthesis should be short enough. The duration of a synthesized phrase cannot not be more than 24 seconds, while its length, including the variable part, cannot exceed 250 characters. The variable part of a normalized text in a phrase cannot be longer than 25% of the phrase. The same restriction applies to the duration of the variable part relative to the duration of the final audio.

SpeechKit Brand Voice Call Center logs your transmitted patterns (both text and audio). However, the synthesized audio recordings and variable parts, including your sensitive data, are not logged. To improve the model's performance with your data, you can enable variable logging through the x-data-logging-enabled: true header.

Note

Data logging may prove useful if synthesis errors occur. If you do not want to log all data, only include the logging header in requests with issues after clearing the variable part of personal data as much as possible.

See alsoSee also

  • SpeechKit Brand Voice API
  • Pattern-based speech synthesis
  • Pattern-based speech synthesis using the API v3
  • How AI helps with customer support: Case studies from banking, retail, and IT
  • From product cards to employee training: How AI transforms modern retail

Was the article helpful?

Previous
List of SSML supported phonemes
Next
Data for SpeechKit Brand Voice Lite
© 2025 Direct Cursus Technology L.L.C.