Sending an asynchronous request
You can send requests to text generation models in asynchronous mode. In response to an asynchronous request, the model will return an operation object containing the operation ID you can use to follow up the operation's progress and get the result once the generation is complete. Use this mode if you do not need an urgent response, since asynchronous requests take longer to complete than synchronous ones.
Getting started
To use the examples of requests using SDK:
- Create a service account and assign the
ai.languageModels.userrole to it. -
Get the service account API key and save it.
The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
Note
If you are using Windows
, we recommend installing the WSL shell first and then using it to perform other operations. -
Install Python 3.10
or higher. -
Install Python venv
to create isolated virtual environments in Python. -
Create a new Python virtual environment and activate it:
python3 -m venv new-env source new-env/bin/activate -
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
Get API authentication credentials as described in Authentication with the Yandex AI Studio API.
To use the examples, install cURL
Send a request to the model
When using Yandex Cloud ML SDK, you can configure your code to wait for the operation to complete and return the response. To do this, use either the sleep function of the time module or the wait method. The example utilizes both of these methods one by one.
-
Create a file named
generate-deferred.pyand paste the following code into it:#!/usr/bin/env python3 from __future__ import annotations import time from yandex_cloud_ml_sdk import YCloudML messages_1 = [ { "role": "system", "text": "Find errors in the text and correct them", }, { "role": "user", "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's room room. It withsatnds moisturre and mechanical dammage thanks to a 0.2 mm thick proctive layer of melamine films and a wax-treated interlocking system.""", }, ] messages_2 = [ {"role": "system", "text": "Find errors in the text and correct them"}, {"role": "user", "text": "Erors wyll not corrct themselfs."}, ] def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) model = sdk.models.completions("yandexgpt") # Variant 1: wait for the operation to complete using 5-second sleep periods print("Variant 1:") operation = model.configure(temperature=0.5).run_deferred(messages_1) status = operation.get_status() while status.is_running: time.sleep(5) status = operation.get_status() result = operation.get_result() print(result) # Variant 2: wait for the operation to complete using the wait method print("Variant 2:") operation = model.run_deferred(messages_2) result = operation.wait() print(result) if __name__ == "__main__": main()Where:
Note
As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the
TextMessageclass, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.-
messages_1andmessages_2: Arrays of messages providing the context for the model, each used for a different method of getting an asynchronous request result:-
role: Message sender's role:user: To send user messages to the model.system: To set the request context and define the model's behavior.assistant: For responses generated by the model. In chat mode, the model's responses tagged with theassistantrole are included in the message to save the conversation context. Do not send user messages with this role.
-
-
text: Message text.
-
<folder_ID>: ID of the folder in which the service account was created. -
<API_key>: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
For more information about accessing a specific model version, see Accessing models.
-
-
Run the file you created:
python3 generate-deferred.pyResult:
Variant 1: GPTModelResult(alternatives=(Alternative(role='assistant', text='Laminate flooring is suitable for installation in the kitchen or in a child's room. It withstands moisture and mechanical damage thanks to a 0.2 mm thick protective layer of melamine films and a wax-treated interlocking system.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=74, completion_tokens=46, total_tokens=120), model_version='23.10.2024') Variant 2: GPTModelResult(alternatives=(Alternative(role='assistant', text='Errors will not correct themselves.\n\nErors → errors.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=32, completion_tokens=16, total_tokens=48), model_version='23.10.2024')The code waits for the result of the first method and then of the second one.
To use the examples, install cURL
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Create a file with the request body, e.g.,
body.json:{ "modelUri": "gpt://<folder_ID>/yandexgpt", "completionOptions": { "stream": false, "temperature": 0.1, "maxTokens": "2000", "reasoningOptions": { "mode": "DISABLED" } }, "messages": [ { "role": "system", "text": "Translate the text" }, { "role": "user", "text": "To be, or not to be: that is the question." } ] }-
modelUri: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the tuned model's ID. -
completionOptions: Request configuration options:-
stream: Enables streaming of partially generated text. It can either betrueorfalse. -
temperature: With a higher temperature, you get more creative and randomized responses from the model. Its values range from0to1, inclusive. The default value is0.3. -
maxTokens: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Yandex AI Studio quotas and limits. -
reasoningOptions.mode: Reasoning mode parameters. This is an optional parameter. The default value isDISABLED. The possible values are:DISABLED: Reasoning mode is disabled.ENABLED_HIDDEN: Reasoning mode is enabled. The model will decide by itself whether or not to use this mode for each particular request.
-
-
messages: List of messages that set the context for the model:-
role: Message sender's role:user: To send user messages to the model.system: To set the request context and define the model's behavior.assistant: For responses generated by the model. In chat mode, the model's responses tagged with theassistantrole are included in the message to save the conversation context. Do not send user messages with this role.
-
text: Message text.
-
-
-
Send a request to the model by running this command:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: ${FOLDER_ID}" \ --data "@<path_to_JSON_file>" \ "https://llm.api.cloud.yandex.net/foundationModels/v1/completionAsync"Where:
FOLDER_ID: ID of the folder for which your account has theai.languageModels.userrole or higher.IAM_TOKEN: IAM token you got before you started.
In the response, the service will return the
operationobject:{ "id": "d7qi6shlbvo5********", "description": "Async GPT Completion", "createdAt": "2023-11-30T18:31:32Z", "createdBy": "aje2stn6id9k********", "modifiedAt": "2023-11-30T18:31:33Z", "done": false, "metadata": null }Save the operation
idyou get in the response. -
Send a request to get the operation result:
curl \ --request GET \ --header "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<operation_ID>Sample result:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.foundation_models.v1.CompletionResponse", "alternatives": [ { "message": { "role": "assistant", "text": "To be, or not to be, that is the question." }, "status": "ALTERNATIVE_STATUS_FINAL" } ], "usage": { "inputTextTokens": "31", "completionTokens": "10", "totalTokens": "41" }, "modelVersion": "18.01.2024" }, "id": "d7qo21o5fj1u********", "description": "Async GPT Completion", "createdAt": "2024-05-12T18:46:54Z", "createdBy": "ajes08feato8********", "modifiedAt": "2024-05-12T18:46:55Z" }
See also
- Overview of Yandex AI Studio AI models
- Examples of working with ML SDK on GitHub