SpeechKit Recognition API v3, REST: AsyncRecognizer.recognizeFile
HTTP request
POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync
Body parameters
{
"recognitionModel": {
"model": "string",
"audioFormat": {
// `recognitionModel.audioFormat` includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string",
"audioChannelCount": "string"
},
"containerAudio": {
"containerAudioType": "string"
},
// end of the list of possible fields`recognitionModel.audioFormat`
},
"textNormalization": {
"textNormalization": "string",
"profanityFilter": true,
"literatureText": true,
"phoneFormattingMode": "string"
},
"languageRestriction": {
"restrictionType": "string",
"languageCode": [
"string"
]
},
"audioProcessingType": "string"
},
"recognitionClassifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"string"
]
}
]
},
"speechAnalysis": {
"enableSpeakerAnalysis": true,
"enableConversationAnalysis": true,
"descriptiveStatisticsQuantiles": [
"number"
]
},
"speakerLabeling": {
"speakerLabeling": "string"
},
// includes only one of the fields `content`, `uri`
"content": "string",
"uri": "string",
// end of the list of possible fields
}
Field | Description |
---|---|
recognitionModel | object Configuration for speech recognition model. |
recognitionModel. model |
string Sets the recognition model for the cloud version of SpeechKit. Possible values: 'general', 'general:rc', 'general:deprecated'. The model is ignored for SpeechKit Hybrid. |
recognitionModel. audioFormat |
object Specified input audio. Audio format options. |
recognitionModel. audioFormat. rawAudio |
object Audio without container. recognitionModel.audioFormat includes only one of the fields rawAudio , containerAudio |
recognitionModel. audioFormat. rawAudio. audioEncoding |
string Type of audio encoding
|
recognitionModel. audioFormat. rawAudio. sampleRateHertz |
string (int64) PCM sample rate |
recognitionModel. audioFormat. rawAudio. audioChannelCount |
string (int64) PCM channel count. Currently only single channel audio is supported in real-time recognition. |
recognitionModel. audioFormat. containerAudio |
object Audio is wrapped in container. recognitionModel.audioFormat includes only one of the fields rawAudio , containerAudio |
recognitionModel. audioFormat. containerAudio. containerAudioType |
string Type of audio container.
|
recognitionModel. textNormalization |
object Text normalization options. Options |
recognitionModel. textNormalization. textNormalization |
string Normalization
|
recognitionModel. textNormalization. profanityFilter |
boolean (boolean) Profanity filter (default: false). |
recognitionModel. textNormalization. literatureText |
boolean (boolean) Rewrite text in literature style (default: false). |
recognitionModel. textNormalization. phoneFormattingMode |
string Define phone formatting mode
|
recognitionModel. languageRestriction |
object Possible languages in audio. Type of restriction for the list of languages expected in the incoming speech stream. |
recognitionModel. languageRestriction. restrictionType |
string Language restriction type
|
recognitionModel. languageRestriction. languageCode[] |
string The list of language codes to restrict recognition in the case of an auto model |
recognitionModel. audioProcessingType |
string How to deal with audio data (in real time, after all data is received, etc). Default is REAL_TIME.
|
recognitionClassifier | object Configuration for classifiers over speech recognition. |
recognitionClassifier. classifiers[] |
object List of classifiers to use |
recognitionClassifier. classifiers[]. classifier |
string Classifier name |
recognitionClassifier. classifiers[]. triggers[] |
string Describes the types of responses to which the classification results will come
|
speechAnalysis | object Configuration for speech analysis over speech recognition. |
speechAnalysis. enableSpeakerAnalysis |
boolean (boolean) Analyse speech for every speaker |
speechAnalysis. enableConversationAnalysis |
boolean (boolean) Analyse conversation of two speakers |
speechAnalysis. descriptiveStatisticsQuantiles[] |
number (double) Quantile levels in range (0, 1) for descriptive statistics |
speakerLabeling | object Configuration for speaker labeling |
speakerLabeling. speakerLabeling |
string Specifies the execution of speaker labeling. Default is SPEAKER_LABELING_DISABLED.
|
content | string (byte) includes only one of the fields content , uri Bytes with data |
uri | string includes only one of the fields content , uri S3 data url |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": true,
"metadata": "object",
// includes only one of the fields `error`, `response`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
},
"response": "object",
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
Field | Description |
---|---|
id | string ID of the operation. |
description | string Description of the operation. 0-256 characters long. |
createdAt | string (date-time) Creation timestamp. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
createdBy | string ID of the user or service account who initiated the operation. |
modifiedAt | string (date-time) The time when the Operation resource was last modified. String in RFC3339 text format. The range of possible values is from To work with values in this field, use the APIs described in the Protocol Buffers reference. In some languages, built-in datetime utilities do not support nanosecond precision (9 digits). |
done | boolean (boolean) If the value is |
metadata | object Service-specific metadata associated with the operation. It typically contains the ID of the target resource that the operation is performed on. Any method that returns a long-running operation should document the metadata type, if any. |
error | object The error result of the operation in case of failure or cancellation. includes only one of the fields error , response |
error. code |
integer (int32) Error code. An enum value of google.rpc.Code. |
error. message |
string An error message. |
error. details[] |
object A list of messages that carry the error details. |
response | object includes only one of the fields error , response The normal response of the operation in case of success. If the original method returns no data on success, such as Delete, the response is google.protobuf.Empty. If the original method is the standard Create/Update, the response should be the target resource of the operation. Any method that returns a long-running operation should document the response type, if any. |