SpeechKit Recognition API v3, REST: AsyncRecognizer.RecognizeFile
Performs asynchronous speech recognition.
HTTP request
POST https://stt.api.cloud.yandex.net/stt/v3/recognizeFileAsync
Body parameters
{
// Includes only one of the fields `content`, `uri`
"content": "string",
"uri": "string",
// end of the list of possible fields
"recognitionModel": {
"model": "string",
"audioFormat": {
// Includes only one of the fields `rawAudio`, `containerAudio`
"rawAudio": {
"audioEncoding": "string",
"sampleRateHertz": "string",
"audioChannelCount": "string"
},
"containerAudio": {
"containerAudioType": "string"
}
// end of the list of possible fields
},
"textNormalization": {
"textNormalization": "string",
"profanityFilter": "boolean",
"literatureText": "boolean",
"phoneFormattingMode": "string"
},
"languageRestriction": {
"restrictionType": "string",
"languageCode": [
"string"
]
},
"audioProcessingType": "string"
},
"recognitionClassifier": {
"classifiers": [
{
"classifier": "string",
"triggers": [
"string"
]
}
]
},
"speechAnalysis": {
"enableSpeakerAnalysis": "boolean",
"enableConversationAnalysis": "boolean",
"descriptiveStatisticsQuantiles": [
"string"
]
},
"speakerLabeling": {
"speakerLabeling": "string"
},
"summarization": {
"modelUri": "string",
"properties": [
{
"instruction": "string",
// Includes only one of the fields `jsonObject`, `jsonSchema`
"jsonObject": "boolean",
"jsonSchema": {
"schema": "object"
}
// end of the list of possible fields
}
]
}
}
|
Field |
Description |
|
content |
string (bytes) Bytes with data Includes only one of the fields |
|
uri |
string S3 data URL Includes only one of the fields |
|
recognitionModel |
Configuration for speech recognition model. |
|
recognitionClassifier |
Configuration for classifiers over speech recognition. |
|
speechAnalysis |
Configuration for speech analysis over speech recognition. |
|
speakerLabeling |
Configuration for speaker labeling |
|
summarization |
Summarization options |
RecognitionModelOptions
|
Field |
Description |
|
model |
string Sets the recognition model for the cloud version of SpeechKit. |
|
audioFormat |
Specified input audio. |
|
textNormalization |
Text normalization options. |
|
languageRestriction |
Possible languages in audio. |
|
audioProcessingType |
enum (AudioProcessingType) For
|
AudioFormatOptions
Audio format options.
|
Field |
Description |
|
rawAudio |
RAW audio without container. Includes only one of the fields |
|
containerAudio |
Audio is wrapped in container. Includes only one of the fields |
RawAudio
RAW Audio format spec (no container to infer type). Used in AudioFormat options.
|
Field |
Description |
|
audioEncoding |
enum (AudioEncoding) Type of audio encoding.
|
|
sampleRateHertz |
string (int64) PCM sample rate. |
|
audioChannelCount |
string (int64) PCM channel count. Currently only single channel audio is supported in real-time recognition. |
ContainerAudio
Audio with fixed type in container. Used in AudioFormat options.
|
Field |
Description |
|
containerAudioType |
enum (ContainerAudioType) Type of audio container.
|
TextNormalizationOptions
Options for post-processing text results. The normalization levels depend on the settings and the language.
For detailed information, see documentation.
|
Field |
Description |
|
textNormalization |
enum (TextNormalization)
|
|
profanityFilter |
boolean Profanity filter (default: false). |
|
literatureText |
boolean Rewrite text in literature style (default: false). |
|
phoneFormattingMode |
enum (PhoneFormattingMode) Define phone formatting mode
|
LanguageRestrictionOptions
Type of restriction for the list of languages expected in the incoming audio.
|
Field |
Description |
|
restrictionType |
enum (LanguageRestrictionType) Language restriction type.
|
|
languageCode[] |
string The list of language codes to restrict recognition in the case of an auto model. |
RecognitionClassifierOptions
|
Field |
Description |
|
classifiers[] |
List of classifiers to use. For detailed information and usage example, see documentation. |
RecognitionClassifier
|
Field |
Description |
|
classifier |
string Classifier name |
|
triggers[] |
enum (TriggerType) Describes the types of responses to which the classification results will come. Classification responses will follow the responses of the specified types.
|
SpeechAnalysisOptions
|
Field |
Description |
|
enableSpeakerAnalysis |
boolean Analyse speech for every speaker |
|
enableConversationAnalysis |
boolean Analyse conversation of two speakers |
|
descriptiveStatisticsQuantiles[] |
string Quantile levels in range (0, 1) for descriptive statistics |
SpeakerLabelingOptions
|
Field |
Description |
|
speakerLabeling |
enum (SpeakerLabeling) Specifies the execution of speaker labeling.
|
SummarizationOptions
Represents transcription summarization options.
|
Field |
Description |
|
modelUri |
string The ID of the model to be used for completion generation. |
|
properties[] |
A list of suimmarizations to perform with transcription. |
SummarizationProperty
Represents summarization entry for transcription.
|
Field |
Description |
|
instruction |
string Summarization instruction for model. |
|
jsonObject |
boolean When set to true, the model will return a valid JSON object. Includes only one of the fields Specifies the format of the model's response. |
|
jsonSchema |
Enforces a specific JSON structure for the model's response based on a provided schema. Includes only one of the fields Specifies the format of the model's response. |
JsonSchema
Represents the expected structure of the model's response using a JSON Schema.
|
Field |
Description |
|
schema |
object The JSON Schema that the model's output must conform to. |
Response
HTTP Code: 200 - OK
{
"id": "string",
"description": "string",
"createdAt": "string",
"createdBy": "string",
"modifiedAt": "string",
"done": "boolean",
"metadata": "object",
// Includes only one of the fields `error`
"error": {
"code": "integer",
"message": "string",
"details": [
"object"
]
}
// end of the list of possible fields
}
An Operation resource. For more information, see Operation.
|
Field |
Description |
|
id |
string ID of the operation. |
|
description |
string Description of the operation. 0-256 characters long. |
|
createdAt |
string (date-time) Creation timestamp. String in RFC3339 To work with values in this field, use the APIs described in the |
|
createdBy |
string ID of the user or service account who initiated the operation. |
|
modifiedAt |
string (date-time) The time when the Operation resource was last modified. String in RFC3339 To work with values in this field, use the APIs described in the |
|
done |
boolean If the value is |
|
metadata |
object Service-specific metadata associated with the operation. |
|
error |
The error result of the operation in case of failure or cancellation. Includes only one of the fields The operation result. |
Status
The error result of the operation in case of failure or cancellation.
|
Field |
Description |
|
code |
integer (int32) Error code. An enum value of google.rpc.Code |
|
message |
string An error message. |
|
details[] |
object A list of messages that carry the error details. |