Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex AI Studio
    • About Yandex AI Studio
    • Yandex Workflows
    • Quotas and limits
    • Terms and definitions
  • Compatibility with OpenAI
  • Access management
  • Pricing policy
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • class yandex_cloud_ml_sdk._models.completions.function.AsyncCompletions
  • class yandex_cloud_ml_sdk._models.completions.model.AsyncGPTModel

Domain

Written by
Yandex Cloud
Updated at September 25, 2025
  • class yandex_cloud_ml_sdk._models.completions.function.AsyncCompletions
  • class yandex_cloud_ml_sdk._models.completions.model.AsyncGPTModel

class yandexcloudmlsdk.models.completions.function.AsyncCompletionsclass yandex_cloud_ml_sdk._models.completions.function.AsyncCompletions

A class for handling completions models.

It defines the core functionality for calling a model to generate completions based on the provided model name and version.

__call__(model_name, *, model_version='latest')

Create a model object to call for generating completions.

This method constructs the URI for the model based on the provided name and version. If the name contains ://, it is treated as a full URI. Otherwise, it looks up the model name in the well-known names dictionary. But after this, in any case, we construct a URI in the form gpt://<folder_id>//.

Parameters

  • model_name (str) – the name or URI of the model to call.
  • model_version (str) – the version of the model to use. Defaults to ‘latest’.

Return type

ModelTypeT

Model

class yandexcloudmlsdk.models.completions.model.AsyncGPTModelclass yandex_cloud_ml_sdk._models.completions.model.AsyncGPTModel

A class for GPT models providing various functionalities including tuning, and batch processing.

async run(messages, *, timeout=180)

Executes the model with the provided messages.

Parameters

  • messages (TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict | Iterable[TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict]) – the input messages to process. Could be a string, a dictionary, or a result object. Read more about other possible message types in the corresponding documentation.
  • timeout – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

GPTModelResult[AsyncToolCall]

async run_stream(messages, *, timeout=180)

Executes the model with the provided messages and yields partial results as they become available.

Parameters

  • messages (TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict | Iterable[TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict]) – the input messages to process.
  • timeout – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncIterator[GPTModelResult[AsyncToolCall]]

async run_deferred(messages, *, timeout=60)

Initiates a deferred execution of the model with the provided messages.

Parameters

  • messages (TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict | Iterable[TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict]) – the input messages to process.
  • timeout – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncOperation[GPTModelResult[AsyncToolCall]]

async attach_deferred(operation_id, timeout=60)

Attaches to an ongoing deferred operation using its operation id.

Parameters

  • operation_id (str) – the id of the deferred operation to attach to.
  • timeout (float) – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncOperation[GPTModelResult[AsyncToolCall]]

async tokenize(messages, *, timeout=60)

Tokenizes the provided messages into a tuple of tokens.

Parameters

  • messages (TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict | Iterable[TextMessage | TextMessageDict | TextMessageProtocol | str | FunctionResultMessageDict]) – the input messages to tokenize.
  • timeout – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

tuple[Token, …]

async tune_deferred(train_datasets, *, validation_datasets=Undefined, name=Undefined, description=Undefined, labels=Undefined, seed=Undefined, lr=Undefined, n_samples=Undefined, additional_arguments=Undefined, tuning_type=Undefined, scheduler=Undefined, optimizer=Undefined, timeout=60)

Initiate a deferred tuning process for the model.

Parameters

  • train_datasets (str | BaseDataset | tuple[str | BaseDataset, float] | Iterable[str | BaseDataset | tuple[str | BaseDataset, float]] | dict[str | BaseDataset, float]) – the dataset objects and/or dataset ids used for training of the model.
  • validation_datasets (str | BaseDataset | tuple[str | BaseDataset, float] | Iterable[str | BaseDataset | tuple[str | BaseDataset, float]] | dict[str | BaseDataset, float] | Undefined) – the dataset objects and/or dataset ids used for validation of the model.
  • name (str | Undefined) – the name of the tuning task.
  • description (str | Undefined) – the description of the tuning task.
  • labels (dict[str, str] | Undefined) – labels for the tuning task.
  • seed (int | Undefined) – a random seed for reproducibility.
  • lr (float | Undefined) – a learning rate for tuning.
  • n_samples (int | Undefined) – a number of samples for tuning.
  • additional_arguments (str | Undefined) – additional arguments for tuning.
  • tuning_type (BaseTuningType | Undefined) – a type of tuning to be applied.
  • scheduler (BaseScheduler | Undefined) – a scheduler for tuning.
  • optimizer (BaseOptimizer | Undefined) – an optimizer for tuning.
  • timeout (float) – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncTuningTask[AsyncGPTModel]

async tune(train_datasets, *, validation_datasets=Undefined, name=Undefined, description=Undefined, labels=Undefined, seed=Undefined, lr=Undefined, n_samples=Undefined, additional_arguments=Undefined, tuning_type=Undefined, scheduler=Undefined, optimizer=Undefined, timeout=60, poll_timeout=259200, poll_interval=60)

Tune the model with the specified training datasets and parameters.

Parameters

  • train_datasets (str | BaseDataset | tuple[str | BaseDataset, float] | Iterable[str | BaseDataset | tuple[str | BaseDataset, float]] | dict[str | BaseDataset, float]) – the dataset objects and/or dataset ids used for training of the model.
  • validation_datasets (str | BaseDataset | tuple[str | BaseDataset, float] | Iterable[str | BaseDataset | tuple[str | BaseDataset, float]] | dict[str | BaseDataset, float] | Undefined) – the dataset objects and/or dataset ids used for validation of the model.
  • name (str | Undefined) – the name of the tuning task.
  • description (str | Undefined) – the description of the tuning task.
  • labels (dict[str, str] | Undefined) – labels for the tuning task.
  • seed (int | Undefined) – a random seed for reproducibility.
  • lr (float | Undefined) – a learning rate for tuning.
  • n_samples (int | Undefined) – a number of samples for tuning.
  • additional_arguments (str | Undefined) – additional arguments for tuning.
  • tuning_type (BaseTuningType | Undefined) – a type of tuning to be applied.
  • scheduler (BaseScheduler | Undefined) – a scheduler for tuning.
  • optimizer (BaseOptimizer | Undefined) – an optimizer for tuning.
  • timeout (float) – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.
  • poll_timeout (int) – the maximum time to wait while polling for completion of the tuning task. Defaults to 259200 seconds (72 hours).
  • poll_interval (float) – the interval between polling attempts during the tuning process. Defaults to 60 seconds.

Return type

Self

async attach_tune_deferred(task_id, *, timeout=60)

Attach a deferred tuning task using its task id.

Parameters

  • task_id (str) – the id of the deferred tuning task to attach to.
  • timeout (float) – the timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncTuningTask[AsyncGPTModel]

property batch: BatchSubdomainTypeT

property config: ConfigTypeT

configure(*, temperature=Undefined, max_tokens=Undefined, reasoning_mode=Undefined, response_format=Undefined, tools=Undefined, parallel_tool_calls=Undefined, tool_choice=Undefined)

Configures the model with specified parameters.

Parameters

  • temperature (float | Undefined) – a sampling temperature to use - higher values mean more random results. Should be a double number between 0 (inclusive) and 1 (inclusive).
  • max_tokens (int | Undefined) – a maximum number of tokens to generate in the response.
  • reasoning_mode (int | str | ReasoningMode | Undefined) – the mode of reasoning to apply during generation, allowing the model to perform internal reasoning before responding. Read more about possible modes in the reasoning documentation.
  • response_format (Literal['json'] | ~yandex_cloud_ml_sdk._types.schemas.JsonSchemaResponseType | type | ~yandex_cloud_ml_sdk._types.misc.Undefined) – a format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model. Read more about possible response formats in the structured output documentation_BaseGPTModel_URL.
  • tools (Sequence[FunctionTool] | FunctionTool | Undefined) – tools to use for completion. Can be a sequence or a single tool.
  • parallel_tool_calls (bool | Undefined) – whether to allow parallel calls to tools during completion. Defaults to true.
  • tool_choice (Literal['none', 'None', 'NONE', 'auto', 'Auto', 'AUTO', 'required', 'Required', 'REQUIRED'] | ~yandex_cloud_ml_sdk._types.tools.function.FunctionDictType | ~yandex_cloud_ml_sdk._tools.tool.FunctionTool | ~yandex_cloud_ml_sdk._types.misc.Undefined) – the strategy for choosing tools. There are several ways to configure tool_choice for query processing: - no tools to call (tool_choice='none'); - required to call any tool (tool_choice='required'); - call a specific tool (tool_choice={'type': 'function', 'function': {'name': 'another_calculator'}} or directly passing a tool object).

Returns

new model instance with provided configuration.

Return type

Self

langchain(model_type='chat', timeout=60)

Initializes a langchain model based on the specified model type.

Parameters

  • model_type (Literal['chat']) – the type of langchain model to initialize. Defaults to "chat".
  • timeout (int) – the timeout which sets the default for the langchain model object. Defaults to 60 seconds.

Return type

BaseYandexLanguageModel

property uri: str

Was the article helpful?

© 2025 Direct Cursus Technology L.L.C.