Sending a request in prompt mode
To generate text in prompt mode, send a request to the model using the completion method or Yandex Cloud ML SDK.
Getting started
To use the examples of requests using SDK:
-
Create a service account and assign the
ai.languageModels.user
role to it. -
Get the service account API key and save it.
The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.
To use the examples, install cURL
Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.
Request to a model via the REST API
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Create a file with the request body, e.g.,
body.json
:{ "modelUri": "gpt://<folder_ID>/yandexgpt-lite", "completionOptions": { "stream": false, "temperature": 0.1, "maxTokens": "1000" }, "messages": [ { "role": "system", "text": "Translate the text" }, { "role": "user", "text": "To be, or not to be: that is the question." } ] }
Where:
-
modelUri
: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the ID of the tuned model. -
completionOptions
: Request configuration options:stream
: Enables streaming of partially generated text. It can either betrue
orfalse
.temperature
: With a higher temperature, you get more creative and randomized responses from the model. Its values range from0
to1
, inclusive. The default value is0.3
.maxTokens
: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Quotas and limits in Yandex Foundation Models.
-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used for sending user messages to the model.system
: Used to set the query context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Message text.
-
-
-
Send a request to the model by running this command:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: ${FOLDER_ID}" \ --data "@<path_to_JSON_file>" \ "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"
Where:
FOLDER_ID
: ID of the folder for which your account has theai.languageModels.user
role or higher.IAM_TOKEN
: IAM token you got before you started.
Result:
{ "result": { "alternatives": [ { "message": { "role": "assistant", "text": "To be, or not to be: that is the question." }, "status": "ALTERNATIVE_STATUS_FINAL" } ], "usage": { "inputTextTokens": "28", "completionTokens": "10", "totalTokens": "38" }, "modelVersion": "06.12.2023" } }
-
Create a file named
test.py
with the model request code:import requests import argparse URL = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion" def run(iam_token, folder_id, user_text): # Building a request data = {} # Specifying model type data["modelUri"] = f"gpt://{folder_id}/yandexgpt" # Configuring options data["completionOptions"] = {"temperature": 0.3, "maxTokens": 1000} # Specifying context for the model data["messages"] = [ {"role": "system", "text": "Correct errors in the text."}, {"role": "user", "text": f"{user_text}"}, ] # Sending the request response = requests.post( URL, headers={ "Accept": "application/json", "Authorization": f"Bearer {iam_token}" }, json=data, ).json() #Printing the result print(response) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("--iam_token", required=True, help="IAM token") parser.add_argument("--folder_id", required=True, help="Folder id") parser.add_argument("--user_text", required=True, help="User text") args = parser.parse_args() run(args.iam_token, args.folder_id, args.user_text)
-
Run the
test.py
file, providing the IAM token and folder ID values:export IAM_TOKEN=<IAM_token> export FOLDER_ID=<folder_ID> export TEXT='Erors wont corrct themselfs' python test.py \ --iam_token ${IAM_TOKEN} \ --folder_id ${FOLDER_ID} \ --user_text ${TEXT}
Result:
{'result': {'alternatives': [{'message': {'role': 'assistant', 'text': 'Errors will not correct themselves.'}, 'status': 'ALTERNATIVE_STATUS_FINAL'}], 'usage': {'inputTextTokens': '29', 'completionTokens': '9', 'totalTokens': '38'}, 'modelVersion': '07.03.2024'}}
Request to a model via the gRPC API
-
Create a file named
generate-text.py
and paste the following code into it:#!/usr/bin/env python3 from __future__ import annotations from yandex_cloud_ml_sdk import YCloudML messages = [ { "role": "system", "text": "Find errors in the text and correct them", }, { "role": "user", "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's room. It withsatnds moisturre and mechanical dammage thanks to a 0.2 mm thick proctive layer of melamine films and a wax-treated interlocking system.""", }, ] def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) result = ( sdk.models.completions("yandexgpt").configure(temperature=0.5).run(messages) ) for alternative in result: print(alternative) if __name__ == "__main__": main()
Where:
Note
As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the
TextMessage
class, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used for sending user messages to the model.system
: Used to set the query context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Message text.
-
-
<folder_ID>
: ID of the folder in which the service account was created. -
<API_key>
: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
-
Run the created file:
python3 generate-text.py
Result:
Alternative(role='assistant', text='Laminate flooring is suitable for installation in the kitchen or in a child's room. It withstands moisture and mechanical damage thanks to a 0.2 mm thick protective layer of melamine films and a wax-treated interlocking system.', status=<AlternativeStatus.FINAL: 3>)
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Clone the Yandex Cloud API repository by entering the code into a notebook cell:
git clone https://github.com/yandex-cloud/cloudapi
-
Use the
pip
package manager to install thegrpcio-tools
package:pip install grpcio-tools
-
Go to the folder hosting the cloned Yandex Cloud API repository:
cd <path_to_cloudapi_folder>
-
Create the
output
folder:mkdir output
-
Generate the client interface code:
python -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/validation.proto \ yandex/cloud/ai/foundation_models/v1/text_generation/text_generation_service.proto \ yandex/cloud/ai/foundation_models/v1/text_common.proto
-
In the
output
folder, create a file namedtest.py
with the model request code:# coding=utf8 import argparse import grpc import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb def run(iam_token, folder_id, user_text): cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred) stub = service_pb_grpc.TextGenerationServiceStub(channel) request = service_pb.CompletionRequest( model_uri=f"gpt://{folder_id}/yandexgpt", completion_options=pb.CompletionOptions( max_tokens={"value": 2000}, temperature={"value": 0.5} ), ) message_system = request.messages.add() message_system.role = "system" message_system.text = "Correct errors in the text." message_user = request.messages.add() message_user.role = "user" message_user.text = user_text it = stub.Completion(request, metadata=( ('authorization', f'Bearer {iam_token}'), )) for response in it: for alternative in response.alternatives: print (alternative.message.text) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("--iam_token", required=True, help="IAM token") parser.add_argument("--folder_id", required=True, help="Folder id") parser.add_argument("--user_text", required=True, help="User text") args = parser.parse_args() run(args.iam_token, args.folder_id, args.user_text)
-
Run the
test.py
file, providing the IAM token and folder ID values:export IAM_TOKEN=<IAM_token> export FOLDER_ID=<folder_ID> export TEXT='Erors wont corrct themselfs' python output/test.py \ --iam_token ${IAM_TOKEN} \ --folder_id ${FOLDER_ID} \ --user_text ${TEXT}
Result:
Errors will not correct themselves.
Streaming request via the gRPC API
If the run_stream
method is used, the server will provide not just the final text generation result but intermediate results as well. Each intermediate response contains the whole generation result that is currently available. Until the final response is received, the generation results may change as new messages arrive.
The difference the run_stream
method makes can be seen most directly when creating and processing large texts.
-
Create a file named
generate-text.py
and paste the following code into it:#!/usr/bin/env python3 from __future__ import annotations from yandex_cloud_ml_sdk import YCloudML messages = [ {"role": "system", "text": "Find errors in the text and correct them"}, {"role": "user", "text": "Erors wyll not corrct themselfs."}, ] def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) model = sdk.models.completions("yandexgpt") for result in model.configure(temperature=0.5).run_stream(messages): for alternative in result: print(alternative) if __name__ == "__main__": main()
Where:
-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used for sending user messages to the model.system
: Used to set the query context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Message text.
-
-
<folder_ID>
: ID of the folder in which the service account was created. -
<API_key>
: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
-
Run the created file:
python3 generate-text.py
Result:
Alternative(role='assistant', text='O', status=<AlternativeStatus.PARTIAL: 1>) Alternative(role='assistant', text='Errors will not correct themselves.', status=<AlternativeStatus.FINAL: 3>)
With the stream
parameter enabled, the server will provide not just the final text generation result but intermediate results as well. Each intermediate response contains the whole generation result that is currently available. Until the final response is received, the generation results may change as new messages arrive.
The difference the stream
parameter makes can be seen most directly when creating and processing large texts.
Warning
The stream
parameter is not available for the model's asynchronous mode.
Generate the gRPC client interface code as described in this guide. At Step 6, generate a file named test.py
with the code to access the model.
# coding=utf8
import argparse
import grpc
import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb
def run(iam_token, folder_id, user_text):
cred = grpc.ssl_channel_credentials()
channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred)
stub = service_pb_grpc.TextGenerationServiceStub(channel)
request = service_pb.CompletionRequest(
model_uri=f"gpt://{folder_id}/yandexgpt",
completion_options=pb.CompletionOptions(
max_tokens={"value": 2000},
temperature={"value": 0.5},
stream=True
),
)
message_system = request.messages.add()
message_system.role = "system"
message_system.text = "Correct errors in the text."
message_user = request.messages.add()
message_user.role = "user"
message_user.text = user_text
it = stub.Completion(request, metadata=(
('authorization', f'Bearer {iam_token}'),
))
for response in it:
print(response)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--iam_token", required=True, help="IAM token")
parser.add_argument("--folder_id", required=True, help="Folder id")
parser.add_argument("--user_text", required=True, help="User text")
args = parser.parse_args()
run(args.iam_token, args.folder_id, args.user_text)
Result:
alternatives {
message {
role: "assistant"
text: "E"
}
status: ALTERNATIVE_STATUS_PARTIAL
}
usage {
input_text_tokens: 29
completion_tokens: 1
total_tokens: 30
}
model_version: "07.03.2024"
alternatives {
message {
role: "assistant"
text: "Errors will not correct themselves."
}
status: ALTERNATIVE_STATUS_FINAL
}
usage {
input_text_tokens: 29
completion_tokens: 9
total_tokens: 38
}
model_version: "07.03.2024"
See also
- Text generation overview
- Examples of working with ML SDK on GitHub