How to improve the accuracy of translations
To increase the accuracy of translations:
- Specify the source language. Some words are written the same in different languages, but have different meanings. If the model detects the wrong source language, these words are translated differently.
- Specify your translation glossary. A word can be translated different ways. For example, the English word oil can be translated into Russian as масло or нефть. You can use a glossary to indicate the proper translation of a word or phrase. You can learn more about glossaries here.
Getting started
To use the examples, install cURL
The examples below are intended to be run in MacOS and Linux. To run them in Windows, see how to work with Bash in Microsoft Windows.
To authenticate under a service account, you can use an API key or an IAM token; to authenticate under a user account, you can only use an IAM token.
Get your account details for authentication in the Translate API:
-
If you do not have a service account, create one.
-
Assign the
ai.translate.user
role for the folder to the service account. -
Get the ID of the folder your service account was created in. Make sure to include the folder ID in the
folderId
field in the body of each request. -
Create an API key with the
yc.ai.translate.execute
scope.Provide the key in the
Authorization
header of each request in the following format:Authorization: Api-Key <API_key>
-
Get the ID of any folder for which your account has the
ai.translate.user
role or higher. Make sure to include the folder ID in thefolderId
field in the body of each request. -
Get an IAM token for your Yandex account, federated account or service account.
Provide the token in the
Authorization
header of each request in the following format:Authorization: Bearer <IAM_token>
Specify the source language
Words are sometimes written the same in different languages but translated differently. For example, the word angel
means a spiritual being in English, while in German it means a fishing rod. If the text you provide contains such words, Translate may detect the wrong source language.
To avoid mistakes, specify the source language in the sourceLanguageCode
field:
{
"folderId": "<folder_ID>",
"texts": ["angel"],
"targetLanguageCode": "ru",
"sourceLanguageCode": "de"
}
Where:
folderId
: Folder ID you got before you started.texts
: Text to translate, as a list of strings.targetLanguageCode
: Target language. You can get the language code together with a list of supported languages.sourceLanguageCode
: Source language.
Save the request body to a file (for example, body.json
) and provide the file to the model using the translate method:
export API_KEY=<API_key>
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Api-Key ${API_KEY}" \
--data '@<path_to_JSON_file>' \
"https://translate.api.cloud.yandex.net/translate/v2/translate"
Where API_KEY
is the API key you got before you started. If you use an IAM token for authentication, change the Authorization
header to "Authorization: Bearer <IAM_token>"
.
This returns a translation from the correct language:
{
"translations": [
{
"text": "удочка"
}
]
}
Specify your translation glossary
A word can be translated different ways. For example, the English word oil can be translated into Russian as масло or нефть. To improve the accuracy of translations, use a glossary of your terms and phrases with a single translation.
Specify the glossary in the glossaryConfig
field. Currently, you can only provide a glossary as an array of text pairs.
In the sourceLanguageCode
field, specify the source language. This field is required when you use glossaries:
{
"sourceLanguageCode": "tr",
"targetLanguageCode": "ru",
"texts": [
"cırtlı çocuk spor ayakkabı"
],
"folderId": "<folder_ID>",
"glossaryConfig": {
"glossaryData": {
"glossaryPairs": [
{
"sourceText": "spor ayakkabı",
"translatedText": "кроссовки"
}
]
}
}
}
Where:
sourceLanguageCode
: Source language. You can get the language code together with a list of supported languages.targetLanguageCode
: Target language.texts
: Text to translate, as a list of strings.folderId
: Folder ID you got before you started.
Save the request body to a file (for example, body.json
) and provide the file to the model using the translate method:
export API_KEY=<API_key>
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Api-Key ${API_KEY}" \
--data '@<path_to_JSON_file>' \
"https://translate.api.cloud.yandex.net/translate/v2/translate"
Where API_KEY
is the API key you got before you started. If you use an IAM token for authentication, change the Authorization
header to "Authorization: Bearer <IAM_token>"
.
The response will contain a translation based on the terms from your glossary:
{
"translations": [
{
"text": "Детские кроссовки с липучкой"
}
]
}
Without the glossary, the translation would be:
{
"translations": [
{
"text": "детская спортивная обувь с липучкой"
}
]
}
Escaping text
For particular text fragments to remain untranslated, specify the HTML
text format in the request body and escape those fragments using the <span>
tag with the translate=no
attribute. For example:
{
"format": "HTML",
"texts": [
"The e-mail has been changed. The new password is **<span translate=no>**%\$Qvd14aa2NMc**</span>**"
]
}
Where:
format
: Text format.texts
: Text to translate, as a list of strings.
In the response, the text inside the <span>
tag will remain untranslated:
{
"translations": [
{
"text": "L'e-mail a été modifié. Le nouveau mot de passe est **<span translate="no">**%\$Qvd14aa2NMc**</span>**"
}
]
}
Checking words for typos
Misspelled words may be translated incorrectly or transliterated. For example, the word hellas
is translated as эллада
. If the same word is misspelled, let's say as helas
, it will be translated as хелас
. Use the speller
parameter to run a spellcheck:
{
"sourceLanguageCode": "en"
"targetLanguageCode": "ru",
"texts": [
"helas"
],
"folderId": "<folder_ID>",
"speller": true
}
Where:
sourceLanguageCode
: Source language. You can get the language code together with a list of supported languages.targetLanguageCode
: Target language.texts
: Text to translate, as a list of strings.folderId
: Folder ID you got before you started.speller
: Parameter that activates the spell checker.
Save the request body to a file (for example, body.json
) and provide the file to the model using the translate method:
export API_KEY=<API_key>
curl \
--request POST \
--header "Content-Type: application/json" \
--header "Authorization: Api-Key ${API_KEY}" \
--data '@<path_to_JSON_file>' \
"https://translate.api.cloud.yandex.net/translate/v2/translate"
Where API_KEY
is the API key you got before you started. If you use an IAM token for authentication, change the Authorization
header to "Authorization: Bearer <IAM_token>"
.
The response will contain a spell-checked translation of the word:
{
"translations": [
{
"text": "эллада"
}
]
}
If the spell checker is not enabled ("speller": false
), the word will be translated as follows:
{
"translations": [
{
"text": "хелас"
}
]
}