Sending an asynchronous request
You can request YandexGPT API models in asynchronous mode. In response to an asynchronous request, the model will return an operation object containing the operation ID you can use to follow up the operation's progress and get the result once the generation is complete. Use this mode if you do not need an urgent response, since asynchronous requests take longer to complete than synchronous ones.
Getting started
To use the examples of requests using SDK:
-
Create a service account and assign the
ai.languageModels.user
role to it. -
Get the service account API key and save it.
The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.
To use the examples, install cURL
Send a request to the model
When using Yandex Cloud ML SDK, you can configure your code to wait for the operation to complete and return the response. To do this, use either the sleep
function of the time
module or the wait
method. The example utilizes both of these methods one by one.
-
Create a file named
generate-deferred.py
and paste the following code into it:#!/usr/bin/env python3 from __future__ import annotations import time from yandex_cloud_ml_sdk import YCloudML messages_1 = [ { "role": "system", "text": "Find errors in the text and correct them", }, { "role": "user", "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's room. It withsatnds moisturre and mechanical dammage thanks to a 0.2 mm thick proctive layer of melamine films and a wax-treated interlocking system.""", }, ] messages_2 = [ {"role": "system", "text": "Find errors in the text and correct them"}, {"role": "user", "text": "Erors wyll not corrct themselfs."}, ] def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) model = sdk.models.completions("yandexgpt") # Variant 1: wait for the operation to complete using 5-second sleep periods print("Variant 1:") operation = model.configure(temperature=0.5).run_deferred(messages_1) status = operation.get_status() while status.is_running: time.sleep(5) status = operation.get_status() result = operation.get_result() print(result) # Variant 2: wait for the operation to complete using the wait method print("Variant 2:") operation = model.run_deferred(messages_2) result = operation.wait() print(result) if __name__ == "__main__": main()
Where:
Note
As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the
TextMessage
class, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.-
messages_1
andmessages_2
: Arrays of messages providing the context for the model, each used for a different method of getting an asynchronous request result:-
role
: Message sender's role:user
: Used for sending user messages to the model.system
: Used to set the query context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
-
text
: Message text.
-
<folder_ID>
: ID of the folder in which the service account was created. -
<API_key>
: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
-
Run the created file:
python3 generate-deferred.py
Result:
Variant 1: GPTModelResult(alternatives=(Alternative(role='assistant', text='Ламинат подойдёт для укладки на кухне или в детской комнате – он не боится влаги и механических повреждений благодаря защитному слою из облицованных меламиновых плёнок толщиной 0,2 мм и обработанным воском замкам.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=74, completion_tokens=46, total_tokens=120), model_version='23.10.2024') Variant 2: GPTModelResult(alternatives=(Alternative(role='assistant', text='Errors will not correct themselves.\n\nErors → errors.', status=<AlternativeStatus.FINAL: 3>),), usage=Usage(input_text_tokens=32, completion_tokens=16, total_tokens=48), model_version='23.10.2024')
The code waits for the result of the first method and then of the second one.
To use the examples, install cURL
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Create a file with the request body, e.g.,
body.json
:{ "modelUri": "gpt://<folder_ID>/yandexgpt-lite", "completionOptions": { "stream": false, "temperature": 0.1, "maxTokens": "2000" }, "messages": [ { "role": "system", "text": "Translate the text" }, { "role": "user", "text": "To be, or not to be: that is the question." } ] }
-
modelUri
: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the ID of the tuned model. -
completionOptions
: Request configuration options:stream
: Enables streaming of partially generated text. It can either betrue
orfalse
.temperature
: With a higher temperature, you get more creative and randomized responses from the model. Its values range from0
to1
, inclusive. The default value is0.3
.maxTokens
: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Quotas and limits in Yandex Foundation Models.
-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used for sending user messages to the model.system
: Used to set the query context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Message text.
-
-
-
Send a request to the model by running this command:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: ${FOLDER_ID}" \ --data "@<path_to_JSON_file>" \ "https://llm.api.cloud.yandex.net/foundationModels/v1/completionAsync"
Where:
FOLDER_ID
: ID of the folder for which your account has theai.languageModels.user
role or higher.IAM_TOKEN
: IAM token you got before you started.
In the response, the service will return the
operation
object:{ "id": "d7qi6shlbvo5********", "description": "Async GPT Completion", "createdAt": "2023-11-30T18:31:32Z", "createdBy": "aje2stn6id9k********", "modifiedAt": "2023-11-30T18:31:33Z", "done": false, "metadata": null }
Save the operation
id
you get in the response. -
Send a request to get the operation result:
curl \ --request GET \ --header "Authorization: Bearer ${IAM_TOKEN}" \ https://operation.api.cloud.yandex.net/operations/<operation_ID>
Result example:
{ "done": true, "response": { "@type": "type.googleapis.com/yandex.cloud.ai.foundation_models.v1.CompletionResponse", "alternatives": [ { "message": { "role": "assistant", "text": "To be, or not to be, that is the question." }, "status": "ALTERNATIVE_STATUS_FINAL" } ], "usage": { "inputTextTokens": "31", "completionTokens": "10", "totalTokens": "41" }, "modelVersion": "18.01.2024" }, "id": "d7qo21o5fj1u********", "description": "Async GPT Completion", "createdAt": "2024-05-12T18:46:54Z", "createdBy": "ajes08feato8********", "modifiedAt": "2024-05-12T18:46:55Z" }
See also
- Text generation overview
- Examples of working with ML SDK on GitHub