Sending a request in prompt mode
To generate text in prompt mode, send a request to the model using the completion method.
Getting started
Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.
Request to a model via the REST API
To use the examples, install cURL
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Create a file with the request body, e.g.,
body.json
:{ "modelUri": "gpt://<folder_ID>/yandexgpt-lite", "completionOptions": { "stream": false, "temperature": 0.1, "maxTokens": "1000" }, "messages": [ { "role": "system", "text": "Translate the text" }, { "role": "user", "text": "To be, or not to be: that is the question." } ] }
Where:
-
modelUri
: ID of the model to generate the response. The parameter contains the ID of a Yandex Cloud folder or the ID of a model fine-tuned in DataSphere. -
completionOptions
: Request configuration options:stream
: Enables streaming of partially generated text. It may take either thetrue
orfalse
value.temperature
: With a higher temperature, you get more creative and randomized response from the model. This parameter accepts values between0
and1
, inclusive. The default value is0.3
.maxTokens
: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Quotas and limits in Yandex Foundation Models.
-
messages
: List of messages that set the context for the model:-
role
: Message sender's role:user
: Used to send user messages to the model.system
: Used to set request context and define the model's behavior.assistant
: Used for responses generated by the model. In chat mode, the model's responses tagged with theassistant
role are included in the message to save the conversation context. Do not send user messages with this role.
-
text
: Text content of the message.
-
-
-
Send a request to the model by running this command:
export FOLDER_ID=<folder_ID> export IAM_TOKEN=<IAM_token> curl \ --request POST \ --header "Content-Type: application/json" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --header "x-folder-id: ${FOLDER_ID}" \ --data "@<path_to_JSON_file>" \ "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"
Where:
FOLDER_ID
: ID of the folder for which your account has theai.languageModels.user
role or higher.IAM_TOKEN
: IAM token you got before you started.
Result:
{ "result": { "alternatives": [ { "message": { "role": "assistant", "text": "To be or not to be: that is the question." }, "status": "ALTERNATIVE_STATUS_FINAL" } ], "usage": { "inputTextTokens": "28", "completionTokens": "10", "totalTokens": "38" }, "modelVersion": "06.12.2023" } }
-
Create a file named
test.py
with the model request code:import requests import argparse URL = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion" def run(iam_token, folder_id, user_text): # Building a request data = {} # Specifying model type data["modelUri"] = f"gpt://{folder_id}/yandexgpt" # Configuring options data["completionOptions"] = {"temperature": 0.3, "maxTokens": 1000} # Specifying context for the model data["messages"] = [ {"role": "system", "text": "Correct errors in the text."}, {"role": "user", "text": f"{user_text}"}, ] # Sending the request response = requests.post( URL, headers={ "Accept": "application/json", "Authorization": f"Bearer {iam_token}" }, json=data, ).json() #Printing out the result print(response) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("--iam_token", required=True, help="IAM token") parser.add_argument("--folder_id", required=True, help="Folder id") parser.add_argument("--user_text", required=True, help="User text") args = parser.parse_args() run(args.iam_token, args.folder_id, args.user_text)
-
Run the
test.py
file, substituting the IAM token and folder ID values:export IAM_TOKEN=<IAM_token> export FOLDER_ID=<folder_ID> export TEXT='Erors wont corrct themselfs' python test.py \ --iam_token ${IAM_TOKEN} \ --folder_id ${FOLDER_ID} \ --user_text ${TEXT}
Result:
{'result': {'alternatives': [{'message': {'role': 'assistant', 'text': 'Errors will not correct themselves.'}, 'status': 'ALTERNATIVE_STATUS_FINAL'}], 'usage': {'inputTextTokens': '29', 'completionTokens': '9', 'totalTokens': '38'}, 'modelVersion': '07.03.2024'}}
Request to a model via the gRPC API
The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.
-
Clone the Yandex Cloud API repository by entering the code into a notebook cell:
git clone https://github.com/yandex-cloud/cloudapi
-
Use the pip package manager to install the
grpcio-tools
package:pip install grpcio-tools
-
Go to the folder hosting the cloned Yandex Cloud API repository:
cd <path_to_cloudapi_folder>
-
Create the
output
folder:mkdir output
-
Generate the client interface code:
python -m grpc_tools.protoc -I . -I third_party/googleapis \ --python_out=output \ --grpc_python_out=output \ google/api/http.proto \ google/api/annotations.proto \ yandex/cloud/api/operation.proto \ google/rpc/status.proto \ yandex/cloud/operation/operation.proto \ yandex/cloud/validation.proto \ yandex/cloud/ai/foundation_models/v1/text_generation/text_generation_service.proto \ yandex/cloud/ai/foundation_models/v1/text_common.proto
-
In the
output
folder, create a file namedtest.py
with the model request code:# coding=utf8 import argparse import grpc import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb def run(iam_token, folder_id, user_text): cred = grpc.ssl_channel_credentials() channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred) stub = service_pb_grpc.TextGenerationServiceStub(channel) request = service_pb.CompletionRequest( model_uri=f"gpt://{folder_id}/yandexgpt", completion_options=pb.CompletionOptions( max_tokens={"value": 2000}, temperature={"value": 0.5} ), ) message_system = request.messages.add() message_system.role = "system" message_system.text = "Correct errors in the text." message_user = request.messages.add() message_user.role = "user" message_user.text = user_text it = stub.Completion(request, metadata=( ('authorization', f'Bearer {iam_token}'), )) for response in it: for alternative in response.alternatives: print (alternative.message.text) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("--iam_token", required=True, help="IAM token") parser.add_argument("--folder_id", required=True, help="Folder id") parser.add_argument("--user_text", required=True, help="User text") args = parser.parse_args() run(args.iam_token, args.folder_id, args.user_text)
-
Run the
test.py
file, substituting the IAM token and folder ID values:export IAM_TOKEN=<IAM_token> export FOLDER_ID=<folder_ID> export TEXT='Erors wont corrct themselfs' python output/test.py \ --iam_token ${IAM_TOKEN} \ --folder_id ${FOLDER_ID} \ --user_text ${TEXT}
Result:
Errors will not correct themselves.
Streaming request via the gRPC API
With the stream
parameter enabled, the server will provide not just the final text generation result but intermediate results as well. Each intermediate response contains the whole currently available generation result. Until the final response is received, the generation results may change as new messages arrive.
You can see most clearly how the stream
parameter works when creating and processing large texts.
Warning
The stream
parameter is not available for the model's asynchronous mode.
Generate the gRPC client interface code as described in this guide. At Step 6, generate a file named test.py
with the code to access the model.
# coding=utf8
import argparse
import grpc
import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb
def run(iam_token, folder_id, user_text):
cred = grpc.ssl_channel_credentials()
channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred)
stub = service_pb_grpc.TextGenerationServiceStub(channel)
request = service_pb.CompletionRequest(
model_uri=f"gpt://{folder_id}/yandexgpt",
completion_options=pb.CompletionOptions(
max_tokens={"value": 2000},
temperature={"value": 0.5},
stream=True
),
)
message_system = request.messages.add()
message_system.role = "system"
message_system.text = "Correct errors in the text."
message_user = request.messages.add()
message_user.role = "user"
message_user.text = user_text
it = stub.Completion(request, metadata=(
('authorization', f'Bearer {iam_token}'),
))
for response in it:
print(response)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("--iam_token", required=True, help="IAM token")
parser.add_argument("--folder_id", required=True, help="Folder id")
parser.add_argument("--user_text", required=True, help="User text")
args = parser.parse_args()
run(args.iam_token, args.folder_id, args.user_text)
Result:
alternatives {
message {
role: "assistant"
text: "E"
}
status: ALTERNATIVE_STATUS_PARTIAL
}
usage {
input_text_tokens: 29
completion_tokens: 1
total_tokens: 30
}
model_version: "07.03.2024"
alternatives {
message {
role: "assistant"
text: "Errors will not correct themselves."
}
status: ALTERNATIVE_STATUS_FINAL
}
usage {
input_text_tokens: 29
completion_tokens: 9
total_tokens: 38
}
model_version: "07.03.2024"