Sending an asynchronous request
You can request YandexGPT API models in asynchronous mode. In response to an asynchronous request, the model will return an Operation object containing an operation ID, which you can use to track its execution and get the result after generation is completed. Use this mode if you do not need an urgent response, since asynchronous requests take longer to complete than synchronous ones.
Getting started
Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.
Send a request to the model
To use the examples, install cURL
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Create a file with the request body, e.g.,
body.json
:{ "modelUri": "gpt://<folder_ID>/yandexgpt-lite", "completionOptions": { "stream": false, "temperature": 0.1, "maxTokens": "2000" }, "messages": [ { "role": "system", "text": "Translate the text" }, { "role": "user", "text": "To be, or not to be: that is the question." } ] }
-
modelUri
: ID of the model to generate the response. The parameter contains the ID of a Yandex Cloud folder or the ID of a model fine-tuned in DataSphere. -
completionOptions
: Request configuration options:stream
: Enables streaming of partially generated text. It may take either thetrue
orfalse
value.temperature
: With a higher temperature, you get more creative and randomized response from the model. This parameter accepts values between0
and1
, inclusive. The default value is0.3
.maxTokens
: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Quotas and limits in Yandex Foundation Models.
-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used to send user messages to the model.system
: Used to set request context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Text content of the message.
-
-
-
Send a request to the model by running this command:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: ${FOLDER_ID}" \ --data "@<path_to_JSON_file>" \ "https://llm.api.cloud.yandex.net/foundationModels/v1/completionAsync"
Where:
FOLDER_ID
: ID of the folder for which your account has theai.languageModels.user
role or higher.IAM_TOKEN
: IAM token you got before you started.
In response, the service will return the Operation object:
{ "id": "d7qi6shlbvo5********", "description": "Async GPT Completion", "createdAt": "2023-11-30T18:31:32Z", "createdBy": "aje2stn6id9k********", "modifiedAt": "2023-11-30T18:31:33Z", "done": false, "metadata": null }
Save the operation
id
you get in the response. -
Send a request to get the operation result:
curl \ --request GET \ --header "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.foundation_models.v1.CompletionResponse", "alternatives": [ { "message": { "role": "assistant", "text": "To be, or not to be, that is the question." }, "status": "ALTERNATIVE_STATUS_FINAL" } ], "usage": { "inputTextTokens": "31", "completionTokens": "10", "totalTokens": "41" }, "modelVersion": "18.01.2024" }, "id": "d7qo21o5fj1u********", "description": "Async GPT Completion", "createdAt": "2024-05-12T18:46:54Z", "createdBy": "ajes08feato8********", "modifiedAt": "2024-05-12T18:46:55Z" }