SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile
HTTP request
POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync
Body parameters
{
// Includes only one of the fields `content`, `uri`
"content": "string",
"uri": "string",
// end of the list of possible fields
"recognitionModel": {
"model": "string",
"audioFormat": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string",
"audioChannelCount": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
},
"textNormalization": {
"textNormalization": "string",
"profanityFilter": "boolean",
"literatureText": "boolean",
"phoneFormattingMode": "string"
},
"languageRestriction": {
"restrictionType": "string",
"languageCode": [
"string"
]
},
"audioProcessingType": "string"
},
"recognitionClassifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"string"
]
}
]
},
"speechAnalysis": {
"enableSpeakerAnalysis": "boolean",
"enableConversationAnalysis": "boolean",
"descriptiveStatisticsQuantiles": [
"string"
]
},
"speakerLabeling": {
"speakerLabeling": "string"
}
}
Field |
Description |
content |
string (bytes) Bytes with data Includes only one of the fields |
uri |
string S3 data url Includes only one of the fields |
recognitionModel |
Configuration for speech recognition model. |
recognitionClassifier |
Configuration for classifiers over speech recognition. |
speechAnalysis |
Configuration for speech analysis over speech recognition. |
speakerLabeling |
Configuration for speaker labeling |
RecognitionModelOptions
Field |
Description |
model |
string Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'. |
audioFormat |
Specified input audio. |
textNormalization |
Text normalization options. |
languageRestriction |
Possible languages in audio. |
audioProcessingType |
enum (AudioProcessingType) How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.
|
AudioFormatOptions
Audio format options.
Field |
Description |
rawAudio |
Audio without container. Includes only one of the fields |
containerAudio |
Audio is wrapped in container. Includes only one of the fields |
RawAudio
RAW Audio format spec (no container to infer type). Used in AudioFormat options.
Field |
Description |
audioEncoding |
enum (AudioEncoding) Type of audio encoding
|
sampleRateHertz |
string (int64) PCM sample rate |
audioChannelCount |
string (int64) PCM channel count. Currently only single channel audio is supported in real-time recognition. |
ContainerAudio
Audio with fixed type in container. Used in AudioFormat options.
Field |
Description |
containerAudioType |
enum (ContainerAudioType) Type of audio container.
|
TextNormalizationOptions
Options
Field |
Description |
textNormalization |
enum (TextNormalization)
|
profanityFilter |
boolean Profanity filter (default: false). |
literatureText |
boolean Rewrite text in literature style (default: false). |
phoneFormattingMode |
enum (PhoneFormattingMode) Define phone formatting mode
|
LanguageRestrictionOptions
Type of restriction for the list of languages expected in the incoming speech stream.
Field |
Description |
restrictionType |
enum (LanguageRestrictionType) Language restriction type
|
languageCode[] |
string The list of language codes to restrict recognition in the case of an auto model |
RecognitionClassifierOptions
Field |
Description |
classifiers[] |
List of classifiers to use |
RecognitionClassifier
Field |
Description |
classifier |
string Classifier name |
triggers[] |
enum (TriggerType) Describes the types of responses to which the classification results will come
|
SpeechAnalysisOptions
Field |
Description |
enableSpeakerAnalysis |
boolean Analyse speech for every speaker |
enableConversationAnalysis |
boolean Analyse conversation of two speakers |
descriptiveStatisticsQuantiles[] |
string Quantile levels in range (0, 1) for descriptive statistics |
SpeakerLabelingOptions
Field |
Description |
speakerLabeling |
enum (SpeakerLabeling) Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.
|
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": "boolean",
"metadata": "object",
// Includes only one of the fields `error`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field |
Description |
id |
string ID of the operation. |
description |
string Description of the operation. 0-256 characters long. |
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
createdBy |
string ID of the user or service account who initiated the operation. |
modifiedAt |
string (date-time) The time when the Operation resource was last modified. String in RFC3339 To work with values in this field, use the APIs described in the |
done |
boolean If the value is |
metadata |
object Service-specific metadata associated with the operation. |
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
Status
The error result of the operation in case of failure or cancellation.
Field |
Description |
code |
integer (int32) Error code. An enum value of google.rpc.Code |
message |
string An error message. |
details[] |
object A list of messages that carry the error details. |