Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex SpeechKit
  • SpeechKit technology overview
    • About the technology
    • List of voices
      • Overview
      • Data for Brand Voice Self Service
      • Data for SpeechKit Brand Voice Call Center
  • Supported audio formats
  • IVR integration
  • Quotas and limits
  • Access management
  • Pricing policy

In this article:

  • SpeechKit Brand Voice Premium
  • SpeechKit Brand Voice Self Service
  • SpeechKit Brand Voice Call Center
  • Requirements and restrictions SpeechKit Brand Voice Call Center
  1. Speech synthesis
  2. SpeechKit Brand Voice
  3. Overview

Yandex SpeechKit Brand Voice

Written by
Yandex Cloud
Updated at April 11, 2025
  • SpeechKit Brand Voice Premium
  • SpeechKit Brand Voice Self Service
  • SpeechKit Brand Voice Call Center
    • Requirements and restrictions SpeechKit Brand Voice Call Center

The SpeechKit Brand Voice technology allows you to create unique voices for your speech synthesis model. It can synthesize both plain text and pattern-based text. Patterns contain phrases with variables that are replaced with prepared text. To cover different practical uses of trained models, Yandex Cloud offers three types of SpeechKit Brand Voice.

Brand Voice Premium Brand Voice Self Service Brand Voice Call Center
Voice Voice based on artist recordings Voice based on artist recordings Voice copy from the pattern
Usage Full-text synthesis. Pattern-based synthesis. Full-text synthesis. Pattern-based synthesis. Pattern-based synthesis. The variable part should not exceed 25% of the pattern. The same restriction applies to the duration of the variable part relative to the duration of the final audio.
Emotions and roles Copying emotions in pattern-based synthesis.
Developing additional roles.
Copying emotions in pattern-based synthesis. Copying emotions in pattern-based synthesis.
Sampling frequency in source audio recordings 48 kHz 48 kHz 8 kHz or higher.
Sampling frequency in synthesized audio recordings 22 kHz 22 kHz 8 kHz

To create a unique voice for your business, fill out the form.

SpeechKit Brand Voice PremiumSpeechKit Brand Voice Premium

SpeechKit Brand Voice Premium is suitable for any business task:

  • Voice assistants.
  • Call center robot operators.
  • Text-to-speech conversion of any text.

Creating a full-fledged model with a unique voice requires large amounts of audio recordings. Yandex Cloud experts will help you prepare the data for SpeechKit Brand Voice Premium model training, select a studio and an artist for you, and support you at each step of voice creation.

Once created, the SpeechKit Brand Voice Premium voice can be enhanced by various roles.

SpeechKit Brand Voice Self ServiceSpeechKit Brand Voice Self Service

If you have pre-recorded audios for training the model, you can create a SpeechKit Brand Voice Self Service voice based on them. With a voice like this, you can easily convert texts of any length into spoken language and synthesize speech using patterns. Based on such a voice, you can create voice assistants or robots for your call center.

You can add diverse emotions to your SpeechKit Brand Voice Self Service voice using pattern-based synthesis. In pattern-based synthesis, intonations are copied from your audio recordings.

Note

When using pattern-based synthesis with Yandex SpeechKit Brand Voice voices, make sure your patterns are recorded by the same artist who made recordings for your Yandex SpeechKit Brand Voice voice.

The quality of synthesized speech depends directly on the quality of audio recordings used to train the model. When creating a SpeechKit Brand Voice Self Service voice, you are in charge of the entire process of training data preparation.

For more information on how to train your own model, see Preparing and uploading data for Brand Voice Self Service.

SpeechKit Brand Voice Call CenterSpeechKit Brand Voice Call Center

SpeechKit Brand Voice Call Center is purpose-designed for call center automation and other business scenarios involving phone calls:

  • Telemarketing.
  • Managing calls to level 1 technical support.
  • Surveys.
  • Call center automation.

With Brand Voice Call Center, you do not have to train a special model based on your artist's voice, as the voice will be copied directly from the patterns you provide for phrase generation. The speech is synthesized integrally rather than glued together from a pre-recorded pattern and a generated variable part.

You can use SpeechKit Brand Voice Call Center to automate your standard dialogs.

For example, if you have an audio with the phrase Hi Michael, I am calling from Thunderclouds. My name is Anastasia. Is it a good time to talk?, you can transform it to Hi Ann, I am calling from New Doors. My name is Matt. Is it a good time to talk? without having to record any additional phrases.

Requirements and restrictions SpeechKit Brand Voice Call CenterRequirements and restrictions SpeechKit Brand Voice Call Center

For speech synthesis, you need an audio file with your phrase pattern and a text with marked up variables. To learn more about text requirements, see Requirements for synthesized texts.

The sampling frequency in the synthesized audio recording is 8 kHz. This is enough for phone calls. However, in other scenarios, you might hear some noises and flaws of synthesis.

SpeechKit Brand Voice Call Center is designed for phone calls. The texts for synthesis should be short enough. The duration of a synthesized phrase cannot not be more than 24 seconds, while its length, including the variable part, cannot exceed 250 characters. The variable part of a normalized text in a phrase cannot be longer than 25% of the phrase. The same restriction applies to the duration of the variable part relative to the duration of the final audio.

SpeechKit Brand Voice Call Center logs your transmitted patterns (both text and audio). However, the synthesized audio recordings and variable parts, including your sensitive data, are not logged. To improve the model's performance with your data, you can enable variable logging through the x-data-logging-enabled: true header.

Note

Data logging may prove useful if synthesis errors occur. If you do not want to log all data, only include the logging header in requests with issues after clearing the variable part of personal data as much as possible.

See alsoSee also

  • SpeechKit Brand Voice API
  • Pattern-based speech synthesis
  • Pattern-based speech synthesis using the API v3

Was the article helpful?

Previous
List of SSML supported phonemes
Next
Data for Brand Voice Self Service
© 2025 Direct Cursus Technology L.L.C.