SpeechKit Recognition API v3, gRPC: AsyncRecognizer.RecognizeFile
gRPC request
rpc RecognizeFile (RecognizeFileRequest) returns (yandex.cloud.operation.Operation)
RecognizeFileRequest
{
// Includes only one of the fields `content`, `uri`
"content": "bytes",
"uri": "string",
// end of the list of possible fields
"recognitionModel": {
"model": "string",
"audioFormat": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "AudioEncoding",
"sampleRateHertz": "int64",
"audioChannelCount": "int64"
},
"containerAudio": {
"containerAudioType": "ContainerAudioType"
}
// end of the list of possible fields
},
"textNormalization": {
"textNormalization": "TextNormalization",
"profanityFilter": "bool",
"literatureText": "bool",
"phoneFormattingMode": "PhoneFormattingMode"
},
"languageRestriction": {
"restrictionType": "LanguageRestrictionType",
"languageCode": [
"string"
]
},
"audioProcessingType": "AudioProcessingType"
},
"recognitionClassifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"TriggerType"
]
}
]
},
"speechAnalysis": {
"enableSpeakerAnalysis": "bool",
"enableConversationAnalysis": "bool",
"descriptiveStatisticsQuantiles": [
"double"
]
},
"speakerLabeling": {
"speakerLabeling": "SpeakerLabeling"
}
}
Field |
Description |
content |
bytes Bytes with data Includes only one of the fields |
uri |
string S3 data url Includes only one of the fields |
recognitionModel |
Configuration for speech recognition model. |
recognitionClassifier |
Configuration for classifiers over speech recognition. |
speechAnalysis |
Configuration for speech analysis over speech recognition. |
speakerLabeling |
Configuration for speaker labeling |
RecognitionModelOptions
Field |
Description |
model |
string Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'. |
audioFormat |
Specified input audio. |
textNormalization |
Text normalization options. |
languageRestriction |
Possible languages in audio. |
audioProcessingType |
enum AudioProcessingType How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.
|
AudioFormatOptions
Audio format options.
Field |
Description |
rawAudio |
Audio without container. Includes only one of the fields |
containerAudio |
Audio is wrapped in container. Includes only one of the fields |
RawAudio
RAW Audio format spec (no container to infer type). Used in AudioFormat options.
Field |
Description |
audioEncoding |
enum AudioEncoding Type of audio encoding
|
sampleRateHertz |
int64 PCM sample rate |
audioChannelCount |
int64 PCM channel count. Currently only single channel audio is supported in real-time recognition. |
ContainerAudio
Audio with fixed type in container. Used in AudioFormat options.
Field |
Description |
containerAudioType |
enum ContainerAudioType Type of audio container.
|
TextNormalizationOptions
Options
Field |
Description |
textNormalization |
enum TextNormalization
|
profanityFilter |
bool Profanity filter (default: false). |
literatureText |
bool Rewrite text in literature style (default: false). |
phoneFormattingMode |
enum PhoneFormattingMode Define phone formatting mode
|
LanguageRestrictionOptions
Type of restriction for the list of languages expected in the incoming speech stream.
Field |
Description |
restrictionType |
enum LanguageRestrictionType Language restriction type
|
languageCode[] |
string The list of language codes to restrict recognition in the case of an auto model |
RecognitionClassifierOptions
Field |
Description |
classifiers[] |
List of classifiers to use |
RecognitionClassifier
Field |
Description |
classifier |
string Classifier name |
triggers[] |
enum TriggerType Describes the types of responses to which the classification results will come
|
SpeechAnalysisOptions
Field |
Description |
enableSpeakerAnalysis |
bool Analyse speech for every speaker |
enableConversationAnalysis |
bool Analyse conversation of two speakers |
descriptiveStatisticsQuantiles[] |
double Quantile levels in range (0, 1) for descriptive statistics |
SpeakerLabelingOptions
Field |
Description |
speakerLabeling |
enum SpeakerLabeling Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.
|
yandex.cloud.operation.Operation
{
"id": "string",
"description": "string",
"createdAt": "google.protobuf.Timestamp",
"createdBy": "string",
"modifiedAt": "google.protobuf.Timestamp",
"done": "bool",
"metadata": "google.protobuf.Any",
// Includes only one of the fields `error`, `response`
"error": "google.rpc.Status",
"response": "google.protobuf.Empty"
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
createdAt |
Creation timestamp. |
createdBy |
string ID of the user or service account who initiated the operation. |
modifiedAt |
The time when the Operation resource was last modified. |
done |
bool If the value is |
metadata |
Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
response |
The normal response of the operation in case of success. Includes only one of the fields The operation result. |