Sending a request in prompt mode

Written by

Yandex Cloud

Updated at April 11, 2025

Getting started
Request to a model via the REST API
Request to a model via the gRPC API
- Streaming request via the gRPC API

To generate text in prompt mode, send a request to the model using the completion method or Yandex Cloud ML SDK.

Getting started

SDK

cURL

Python

To use the examples of requests using SDK:

Create a service account and assign the ai.languageModels.user role to it.
Get the service account API key and save it.

The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
Use the pip package manager to install the ML SDK library:
```
pip install yandex-cloud-ml-sdk
```

Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.

To use the examples, install cURL.

Get API authentication credentials as described in Authentication with the Yandex Foundation Models API.

Request to a model via the REST API

cURL

Python

The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.

Create a file with the request body, e.g., body.json:
```
{
  "modelUri": "gpt://<folder_ID>/yandexgpt",
  "completionOptions": {
    "stream": false,
    "temperature": 0.1,
    "maxTokens": "1000",
    "reasoningOptions": {
      "mode": "DISABLED"
    }
  },
  "messages": [
    {
      "role": "system",
      "text": "Translate the text"
    },
    {
      "role": "user",
      "text": "To be, or not to be: that is the question."
    }
  ]
}
```
Where:
- modelUri: ID of the model that will be used to generate the response. The parameter contains the Yandex Cloud folder ID or the tuned model's ID.
- completionOptions: Request configuration options:
  - stream: Enables streaming of partially generated text. It can either be true or false.
  - temperature: With a higher temperature, you get more creative and randomized responses from the model. Its values range from 0 to 1, inclusive. The default value is 0.3.
  - maxTokens: Sets a limit on the model's output in tokens. The maximum number of tokens per generation depends on the model. For more information, see Quotas and limits in Yandex Foundation Models.
  - reasoningOptions.mode: Reasoning mode parameters. This is an optional parameter. The default value is DISABLED. The possible values are:
    - DISABLED: Reasoning mode is disabled.
    - ENABLED_HIDDEN: Reasoning mode is enabled. The model will decide by itself whether or not to use this mode for each particular request.
- messages: List of messages that set the context for the model:
  - role: Message sender's role:
    - user: Used for sending user messages to the model.
    - system: Used to set the query context and define the model's behavior.
    - assistant: Used for responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
  - text: Message text.

Send a request to the model by running this command:

export FOLDER_ID=<folder_ID>
export IAM_TOKEN=<IAM_token>
curl \
  --request POST \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer ${IAM_TOKEN}" \
  --header "x-folder-id: ${FOLDER_ID}" \
  --data "@<path_to_JSON_file>" \
  "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"

Where:

FOLDER_ID: ID of the folder for which your account has the ai.languageModels.user role or higher.
IAM_TOKEN: IAM token you got before you started.

Result:

{
  "result": {
    "alternatives": [
      {
        "message": {
          "role": "assistant",
          "text": "To be, or not to be, that is the question."
        },
        "status": "ALTERNATIVE_STATUS_FINAL"
      }
    ],
    "usage": {
      "inputTextTokens": "28",
      "completionTokens": "10",
      "totalTokens": "38"
    },
    "modelVersion": "06.12.2023"
  }
}

Create a file named test.py with the model request code:

import requests
import argparse

URL = "https://llm.api.cloud.yandex.net/foundationModels/v1/completion"

def run(iam_token, folder_id, user_text):    
    # Building a request
    data = {}
    # Specifying model type
    data["modelUri"] = f"gpt://{folder_id}/yandexgpt"
    # Configuring options
    data["completionOptions"] = {"temperature": 0.3, "maxTokens": 1000}
    # Specifying context for the model
    data["messages"] = [
        {"role": "system", "text": "Correct errors in the text."},
        {"role": "user", "text": f"{user_text}"},
    ]
    
    # Sending the request
    response = requests.post(
        URL,
        headers={
            "Accept": "application/json",
            "Authorization": f"Bearer {iam_token}"
        },
        json=data,
    ).json()

    #Printing the result
    print(response)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--iam_token", required=True, help="IAM token")
    parser.add_argument("--folder_id", required=True, help="Folder id")
    parser.add_argument("--user_text", required=True, help="User text")
    args = parser.parse_args()
    run(args.iam_token, args.folder_id, args.user_text)

Run the test.py file, providing the IAM token and folder ID values:

export IAM_TOKEN=<IAM_token>
export FOLDER_ID=<folder_ID>
export TEXT='Erors wont corrct themselfs'
python test.py \
  --iam_token ${IAM_TOKEN} \
  --folder_id ${FOLDER_ID} \
  --user_text ${TEXT}

Result:

{'result': {'alternatives': [{'message': {'role': 'assistant', 'text': 'Errors will not correct themselves.'}, 'status': 'ALTERNATIVE_STATUS_FINAL'}], 'usage': {'inputTextTokens': '29', 'completionTokens': '9', 'totalTokens': '38'}, 'modelVersion': '07.03.2024'}}

Request to a model via the gRPC API

SDK

Python

Create a file named generate-text.py and paste the following code into it:
```
#!/usr/bin/env python3

from __future__ import annotations
from yandex_cloud_ml_sdk import YCloudML

messages = [
    {
        "role": "system",
        "text": "Find errors in the text and correct them",
    },
    {
        "role": "user",
        "text": """Laminate flooring is sutiable for instalation in the kitchen or in a child's 
room. It withsatnds moisturre and mechanical dammage thanks to 
a 0.2 mm thick proctive layer of melamine films and 
a wax-treated interlocking system.""",
    },
]


def main():
    sdk = YCloudML(
        folder_id="<folder_ID>",
        auth="<API_key>",
    )

    result = (
        sdk.models.completions("yandexgpt").configure(temperature=0.5).run(messages)
    )

    for alternative in result:
        print(alternative)


if __name__ == "__main__":
    main()
```
Where:

Note

As input data for a request, Yandex Cloud ML SDK can accept a string, a dictionary, an object of the TextMessage class, or an array containing any combination of these data types. For more information, see Yandex Cloud ML SDK usage.
- messages: List of messages that set the context for the model:
  - role: Message sender's role:
    - user: Used for sending user messages to the model.
    - system: Used to set the query context and define the model's behavior.
    - assistant: Used for responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
  - text: Message text.
- <folder_ID>: ID of the folder in which the service account was created.
- <API_key>: Service account API key you got earlier required for authentication in the API.
  
  The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
For more information about accessing a specific model version, see Accessing models.

Run the created file:

python3 generate-text.py

Result:

Alternative(role='assistant', text='Laminate flooring is suitable for installation in the kitchen or in a child's room. It withstands moisture and mechanical damage thanks to a 0.2 mm thick protective layer of melamine films and a wax-treated interlocking system.', status=<AlternativeStatus.FINAL: 3>)

The example below is intended to be run in MacOS and Linux. To run it in Windows, see how to work with Bash in Microsoft Windows.

Clone the Yandex Cloud API repository by entering the code into a notebook cell:
```
git clone https://github.com/yandex-cloud/cloudapi
```
Use the pip package manager to install the grpcio-tools package:
```
pip install grpcio-tools
```
Go to the folder hosting the cloned Yandex Cloud API repository:
```
cd <path_to_cloudapi_folder>
```
Create the output folder:
```
mkdir output
```

Generate the client interface code:

python -m grpc_tools.protoc -I . -I third_party/googleapis \
  --python_out=output \
  --grpc_python_out=output \
    google/api/http.proto \
    google/api/annotations.proto \
    yandex/cloud/api/operation.proto \
    google/rpc/status.proto \
    yandex/cloud/operation/operation.proto \
    yandex/cloud/validation.proto \
    yandex/cloud/ai/foundation_models/v1/text_generation/text_generation_service.proto \
    yandex/cloud/ai/foundation_models/v1/text_common.proto

In the output folder, create a file named test.py with the model request code:

# coding=utf8
import argparse
import grpc

import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb

def run(iam_token, folder_id, user_text):
    cred = grpc.ssl_channel_credentials()
    channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred)
    stub = service_pb_grpc.TextGenerationServiceStub(channel)

    request = service_pb.CompletionRequest(
        model_uri=f"gpt://{folder_id}/yandexgpt",
        completion_options=pb.CompletionOptions(
            max_tokens={"value": 2000}, 
            temperature={"value": 0.5}
        ),
    )
    message_system = request.messages.add()
    message_system.role = "system"
    message_system.text = "Correct errors in the text."

    message_user = request.messages.add()
    message_user.role = "user"
    message_user.text = user_text

    it = stub.Completion(request, metadata=(
        ('authorization', f'Bearer {iam_token}'),
    ))
    for response in it:
        for alternative in response.alternatives:
            print (alternative.message.text)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--iam_token", required=True, help="IAM token")
    parser.add_argument("--folder_id", required=True, help="Folder id")
    parser.add_argument("--user_text", required=True, help="User text")
    args = parser.parse_args()
    run(args.iam_token, args.folder_id, args.user_text)

Run the test.py file, providing the IAM token and folder ID values:

export IAM_TOKEN=<IAM_token>
export FOLDER_ID=<folder_ID>
export TEXT='Erors wont corrct themselfs'
python output/test.py \
  --iam_token ${IAM_TOKEN} \
  --folder_id ${FOLDER_ID} \
  --user_text ${TEXT}

Result:

Errors will not correct themselves.

Streaming request via the gRPC API

SDK

Python

If the run_stream method is used, the server will provide not just the final text generation result but intermediate results as well. Each intermediate response contains the whole generation result that is currently available. Until the final response is received, the generation results may change as new messages arrive.

The difference the run_stream method makes can be seen most directly when creating and processing large texts.

Create a file named generate-text.py and paste the following code into it:
```
#!/usr/bin/env python3

from __future__ import annotations
from yandex_cloud_ml_sdk import YCloudML

messages = [
    {"role": "system", "text": "Find errors in the text and correct them"},
    {"role": "user", "text": "Erors wyll not corrct themselfs."},
]


def main():
    sdk = YCloudML(
        folder_id="<folder_ID>",
        auth="<API_key>",
    )

    model = sdk.models.completions("yandexgpt")

    for result in model.configure(temperature=0.5).run_stream(messages):
        for alternative in result:
            print(alternative)


if __name__ == "__main__":
    main()
```
Where:
- messages: List of messages that set the context for the model:
  - role: Message sender's role:
    - user: Used for sending user messages to the model.
    - system: Used to set the query context and define the model's behavior.
    - assistant: Used for responses generated by the model. In chat mode, the model's responses tagged with the assistant role are included in the message to save the conversation context. Do not send user messages with this role.
  - text: Message text.
- <folder_ID>: ID of the folder in which the service account was created.
- <API_key>: Service account API key you got earlier required for authentication in the API.
  
  The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
For more information about accessing a specific model version, see Accessing models.

Run the created file:

python3 generate-text.py

Result:

Alternative(role='assistant', text='O', status=<AlternativeStatus.PARTIAL: 1>)
Alternative(role='assistant', text='Errors will not correct themselves.', status=<AlternativeStatus.FINAL: 3>)

With the stream parameter enabled, the server will provide not just the final text generation result but intermediate results as well. Each intermediate response contains the whole generation result that is currently available. Until the final response is received, the generation results may change as new messages arrive.

The difference the stream parameter makes can be seen most directly when creating and processing large texts.

Warning

The stream parameter is not available for the model's asynchronous mode.

Generate the gRPC client interface code as described in this guide. At Step 6, generate a file named test.py with the code to access the model.

# coding=utf8
import argparse
import grpc

import yandex.cloud.ai.foundation_models.v1.text_common_pb2 as pb
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2_grpc as service_pb_grpc
import yandex.cloud.ai.foundation_models.v1.text_generation.text_generation_service_pb2 as service_pb

def run(iam_token, folder_id, user_text):
    cred = grpc.ssl_channel_credentials()
    channel = grpc.secure_channel('llm.api.cloud.yandex.net:443', cred)
    stub = service_pb_grpc.TextGenerationServiceStub(channel)

    request = service_pb.CompletionRequest(
            model_uri=f"gpt://{folder_id}/yandexgpt",
            completion_options=pb.CompletionOptions(
                max_tokens={"value": 2000},
                temperature={"value": 0.5},
                stream=True
            ),
        )
        message_system = request.messages.add()
        message_system.role = "system"
        message_system.text = "Correct errors in the text."
    
        message_user = request.messages.add()
        message_user.role = "user"
        message_user.text = user_text
    
        it = stub.Completion(request, metadata=(
            ('authorization', f'Bearer {iam_token}'),
        ))             
        
        for response in it:
            print(response)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("--iam_token", required=True, help="IAM token")
    parser.add_argument("--folder_id", required=True, help="Folder id")
    parser.add_argument("--user_text", required=True, help="User text")
    args = parser.parse_args()
    run(args.iam_token, args.folder_id, args.user_text)

Result:

alternatives {
  message {
    role: "assistant"
    text: "E"
  }
  status: ALTERNATIVE_STATUS_PARTIAL
}
usage {
  input_text_tokens: 29
  completion_tokens: 1
  total_tokens: 30
}
model_version: "07.03.2024"

alternatives {
  message {
    role: "assistant"
    text: "Errors will not correct themselves."
  }
  status: ALTERNATIVE_STATUS_FINAL
}
usage {
  input_text_tokens: 29
  completion_tokens: 9
  total_tokens: 38
}
model_version: "07.03.2024"

Sending a request in prompt mode

Getting started

Request to a model via the REST API

Request to a model via the gRPC API

Streaming request via the gRPC API

See also

Was the article helpful?