SpeechKit Recognition API v3, gRPC: AsyncRecognizer.RecognizeFile
gRPC request
rpc RecognizeFile (RecognizeFileRequest) returns (yandex.cloud.operation.Operation)
RecognizeFileRequest
{
// Includes only one of the fields `content`, `uri`
"content": "bytes",
"uri": "string",
// end of the list of possible fields
"recognition_model": {
"model": "string",
"audio_format": {
// Includes only one of the fields `raw_audio`, `container_audio`
"raw_audio": {
"audio_encoding": "AudioEncoding",
"sample_rate_hertz": "int64",
"audio_channel_count": "int64"
},
"container_audio": {
"container_audio_type": "ContainerAudioType"
}
// end of the list of possible fields
},
"text_normalization": {
"text_normalization": "TextNormalization",
"profanity_filter": "bool",
"literature_text": "bool",
"phone_formatting_mode": "PhoneFormattingMode"
},
"language_restriction": {
"restriction_type": "LanguageRestrictionType",
"language_code": [
"string"
]
},
"audio_processing_type": "AudioProcessingType"
},
"recognition_classifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"TriggerType"
]
}
]
},
"speech_analysis": {
"enable_speaker_analysis": "bool",
"enable_conversation_analysis": "bool",
"descriptive_statistics_quantiles": [
"double"
]
},
"speaker_labeling": {
"speaker_labeling": "SpeakerLabeling"
}
}
Field |
Description |
content |
bytes Bytes with data Includes only one of the fields |
uri |
string S3 data url Includes only one of the fields |
recognition_model |
Configuration for speech recognition model. |
recognition_classifier |
Configuration for classifiers over speech recognition. |
speech_analysis |
Configuration for speech analysis over speech recognition. |
speaker_labeling |
Configuration for speaker labeling |
RecognitionModelOptions
Field |
Description |
model |
string Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'. |
audio_format |
Specified input audio. |
text_normalization |
Text normalization options. |
language_restriction |
Possible languages in audio. |
audio_processing_type |
enum AudioProcessingType How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.
|
AudioFormatOptions
Audio format options.
Field |
Description |
raw_audio |
Audio without container. Includes only one of the fields |
container_audio |
Audio is wrapped in container. Includes only one of the fields |
RawAudio
RAW Audio format spec (no container to infer type). Used in AudioFormat options.
Field |
Description |
audio_encoding |
enum AudioEncoding Type of audio encoding
|
sample_rate_hertz |
int64 PCM sample rate |
audio_channel_count |
int64 PCM channel count. Currently only single channel audio is supported in real-time recognition. |
ContainerAudio
Audio with fixed type in container. Used in AudioFormat options.
Field |
Description |
container_audio_type |
enum ContainerAudioType Type of audio container.
|
TextNormalizationOptions
Options
Field |
Description |
text_normalization |
enum TextNormalization
|
profanity_filter |
bool Profanity filter (default: false). |
literature_text |
bool Rewrite text in literature style (default: false). |
phone_formatting_mode |
enum PhoneFormattingMode Define phone formatting mode
|
LanguageRestrictionOptions
Type of restriction for the list of languages expected in the incoming speech stream.
Field |
Description |
restriction_type |
enum LanguageRestrictionType Language restriction type
|
language_code[] |
string The list of language codes to restrict recognition in the case of an auto model |
RecognitionClassifierOptions
Field |
Description |
classifiers[] |
List of classifiers to use |
RecognitionClassifier
Field |
Description |
classifier |
string Classifier name |
triggers[] |
enum TriggerType Describes the types of responses to which the classification results will come
|
SpeechAnalysisOptions
Field |
Description |
enable_speaker_analysis |
bool Analyse speech for every speaker |
enable_conversation_analysis |
bool Analyse conversation of two speakers |
descriptive_statistics_quantiles[] |
double Quantile levels in range (0, 1) for descriptive statistics |
SpeakerLabelingOptions
Field |
Description |
speaker_labeling |
enum SpeakerLabeling Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.
|
yandex.cloud.operation.Operation
{
"id": "string",
"description": "string",
"created_at": "google.protobuf.Timestamp",
"created_by": "string",
"modified_at": "google.protobuf.Timestamp",
"done": "bool",
"metadata": "google.protobuf.Any",
// Includes only one of the fields `error`, `response`
"error": "google.rpc.Status",
"response": "google.protobuf.Empty"
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
created_at |
Creation timestamp. |
created_by |
string ID of the user or service account who initiated the operation. |
modified_at |
The time when the Operation resource was last modified. |
done |
bool If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |