Sending an asynchronous request

Written by

Updated at December 23, 2025

Getting started
Send a request to the model

You can send requests to text generation models in asynchronous mode. In response to an asynchronous request, the model will return an operation object containing the operation ID you can use to follow up the operation's progress and get the result once the generation is complete. Use this mode if you do not need an urgent response, since asynchronous requests take longer to complete than synchronous ones.

Getting started

SDK

cURL

To use the examples of requests using SDK:

Create a service account and assign the ai.languageModels.user role to it.
Get and save the service account's API key with yc.ai.foundationModels.execute for its scope.

The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.

Note

If you are using Windows, we recommend installing the WSL shell first and using it to proceed.
Install Python 3.10 or higher.
Install Python venv to create isolated virtual environments in Python.
Create a new Python virtual environment and activate it:
```
python3 -m venv new-env
source new-env/bin/activate
```
Use the pip package manager to install the ML SDK library:
```
pip install yandex-cloud-ml-sdk
```

Get API authentication credentials as described in Authentication with the Yandex AI Studio API.

To use the examples, install cURL.

Send a request to the model

SDK

cURL

When using Yandex Cloud ML SDK, you can configure your code to wait for the operation to complete and return the response. To do this, use either the sleep function of the time module or the wait method. The example utilizes both of these methods one by one.

Create a file named generate-deferred.py and paste the following code into it:

#!/usr/bin/env python3

from __future__ import annotations
import time
from yandex_cloud_ml_sdk import YCloudML

messages_1 = [
    {
        "role": "system",
        "text": "Find errors in the text and correct them",
    },
    {
        "role": "user",
        "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's room 
room. It withsatnds moisturre and mechanical dammage thanks to 
a 0.2 mm thick proctive layer of melamine films and 
a wax-treated interlocking system.""",
    },
]

messages_2 = [
    {"role": "system", "text": "Find errors in the text and correct them"},
    {"role": "user", "text": "Erors wyll not corrct themselfs."},
]


def main():

    sdk = YCloudML(
        folder_id="<folder_ID>",
        auth="<API_key>",
    )

    model = sdk.models.completions("yandexgpt")

    # Variant 1: wait for the operation to complete using 5-second sleep periods

    print("Variant 1:")

    operation = model.configure(temperature=0.5).run_deferred(messages_1)

    status = operation.get_status()
    while status.is_running:
        time.sleep(5)
        status = operation.get_status()

    result = operation.get_result()
    print(result)

    # Variant 2: wait for the operation to complete using the wait method

    print("Variant 2:")

    operation = model.run_deferred(messages_2)

    result = operation.wait()
    print(result)


if __name__ == "__main__":
    main()

Where:

Note

As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the TextMessage class, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.

messages_1 and messages_2: Arrays of messages providing the context for the model, each used for a different method of getting an asynchronous request result:
- role: Message sender's role:
  - user: To send user messages to the model.
  - system: To set the request context and define the model's behavior.
  - assistant: For responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
text: Message text.

<folder_ID>: ID of the folder in which the service account was created.
<API_key>: Service account API key you got earlier required for authentication in the API.

The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.

For more information about accessing a specific model version, see Accessing models.

Run the file you created:

python3 generate-deferred.py

Result:

Variant 1:
GPTModelResult(alternatives=(Alternative(role='assistant', text='Laminate flooring is suitable for installation in the kitchen or in a child's room. It withstands moisture and mechanical damage thanks to a 0.2 mm thick protective layer of melamine films and a wax-treated interlocking system.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=74, completion_tokens=46, total_tokens=120), model_version='23.10.2024')
Variant 2:
GPTModelResult(alternatives=(Alternative(role='assistant', text='Errors will not correct themselves.\n\nErors → errors.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=32, completion_tokens=16, total_tokens=48), model_version='23.10.2024')

The code waits for the result of the first method and then of the second one.

To use the examples, install cURL.

The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.

Create a file with the request body, e.g., body.json:
```
{
  "modelUri": "gpt://<folder_ID>/yandexgpt",
  "completionOptions": {
    "stream": false,
    "temperature": 0.1,
    "maxTokens": "2000",
    "reasoningOptions": {
      "mode": "DISABLED"
    }
  },
  "messages": [
    {
      "role": "system",
      "text": "Translate the text"
    },
    {
      "role": "user",
      "text": "To be, or not to be: that is the question."
    }
  ]
}
```
- modelUri: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the tuned model's ID.
- completionOptions: Request configuration options:
  - stream: Enables streaming of partially generated text. It can either be true or false.
  - temperature: With a higher temperature, you get more creative and randomized responses from the model. Its values range from 0 to 1, inclusive. The default value is 0.3.
  - maxTokens: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Yandex AI Studio quotas and limits.
  - reasoningOptions.mode: Reasoning mode parameters. This is an optional parameter. The default value is DISABLED. The possible values are:
    - DISABLED: Reasoning mode is disabled.
    - ENABLED_HIDDEN: Reasoning mode is enabled. The model will decide by itself whether or not to use this mode for each particular request.
- messages: List of messages that set the context for the model:
  - role: Message sender's role:
    - user: To send user messages to the model.
    - system: To set the request context and define the model's behavior.
    - assistant: For responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
  - text: Message text.

Send a request to the model by running this command:

export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl \
  --request POST \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  --header "x-folder-id: ${FOLDER_ID}" \
  --data "@<path_to_JSON_file>" \
  "https://llm.api.cloud.yandex.net/foundationModels/v1/completionAsync"

Where:

FOLDER_ID: ID of the folder for which your account has the ai.languageModels.user role or higher.
IAM_TOKEN: IAM token you got before you started.

In the response, the service will return the operation object:

{
  "id": "d7qi6shlbvo5********",
  "description": "Async GPT Completion",
  "createdAt": "2023-11-30T18:31:32Z",
  "createdBy": "aje2stn6id9k********",
  "modifiedAt": "2023-11-30T18:31:33Z",
  "done": false,
  "metadata": null
}

Save the operation id you get in the response.

Send a request to get the operation result:

curl \
  --request GET \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  https://operation.api.cloud.yandex.net/operations/<operation_ID>

Sample result:

{
  "done": true,
  "response": {
    "@type": "type.googleapis.com/yandex.cloud.ai.foundation_models.v1.CompletionResponse",
    "alternatives": [
      {
        "message": {
          "role": "assistant",
          "text": "To be, or not to be, that is the question."
        },
        "status": "ALTERNATIVE_STATUS_FINAL"
      }
    ],
    "usage": {
      "inputTextTokens": "31",
      "completionTokens": "10",
      "totalTokens": "41"
    },
    "modelVersion": "18.01.2024"
  },
  "id": "d7qo21o5fj1u********",
  "description": "Async GPT Completion",
  "createdAt": "2024-05-12T18:46:54Z",
  "createdBy": "ajes08feato8********",
  "modifiedAt": "2024-05-12T18:46:55Z"
}

Sending an asynchronous request

Getting started

Send a request to the model

See also

Was the article helpful?

Sending an asynchronous request

Getting startedGetting started

Send a request to the modelSend a request to the model

See alsoSee also

Was the article helpful?

Getting started

Send a request to the model

See also