Assistants domain

Written by

Updated at September 25, 2025

class yandex_cloud_ml_sdk._assistants.domain.AsyncAssistants
class yandex_cloud_ml_sdk._assistants.assistant.AsyncAssistant

class yandex_cloud_ml_sdk._assistants.domain.AsyncAssistants

Base class for assistants management.

Provides common functionality for creating, getting and listing assistants.

async create(model, *, temperature=Undefined, max_tokens=Undefined, instruction=Undefined, max_prompt_tokens=Undefined, prompt_truncation_strategy=Undefined, name=Undefined, description=Undefined, labels=Undefined, ttl_days=Undefined, tools=Undefined, expiration_policy=Undefined, response_format=Undefined, timeout=60)

Create a new assistant instance.

Parameters

model (str | BaseGPTModel) – Model ID or BaseGPTModel instance
temperature (UndefinedOr[float]) – A sampling temperature to use - higher values mean more random results. Should be a double number between 0 (inclusive) and 1 (inclusive).
max_tokens (UndefinedOr[int]) – Maximum number of tokens to generate
instruction (UndefinedOr[str]) – System instruction for the assistant
max_prompt_tokens (UndefinedOr[int]) – Maximum tokens allowed in prompt
prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Strategy for prompt truncation
name (UndefinedOr[str]) – Assistant name
description (UndefinedOr[str]) – Assistant description
labels (UndefinedOr[dict[str, str]]) – Additional labels associated with the assistant
ttl_days (UndefinedOr[int]) – Time-to-live in days
tools (UndefinedOr[Iterable[BaseTool]]) – Tools to use for completion. Can be a sequence or a single tool.
expiration_policy (UndefinedOr[ExpirationPolicyAlias]) – Expiration policy for assistant
response_format (UndefinedOr[ResponseType]) – A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model. Read more about possible response formats in the structured output documentation_BaseAssistants_Domain_URL.
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncAssistant

async get(assistant_id, *, timeout=60)

Get an existing assistant by ID.

Parameters

assistant_id (str) – ID of the assistant to retrieve
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncAssistant

async list(*, page_size=Undefined, timeout=60)

List all assistants.

Parameters

page_size (int | Undefined) – Number of assistants per page
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncIterator[AsyncAssistant]

class yandex_cloud_ml_sdk._assistants.assistant.AsyncAssistant

Base class providing read-only access to Yandex Cloud ML Assistant configuration and metadata.

This class implements the core interface for interacting with Yandex Cloud ML Assistant API in a read-only manner. It serves as the parent class for both synchronous (Assistant) and asynchronous (AsyncAssistant) implementations.

async update(*, model=Undefined, temperature=Undefined, max_tokens=Undefined, instruction=Undefined, max_prompt_tokens=Undefined, prompt_truncation_strategy=Undefined, name=Undefined, description=Undefined, labels=Undefined, ttl_days=Undefined, tools=Undefined, expiration_policy=Undefined, response_format=Undefined, timeout=60)

Update the assistant’s configuration with new parameters.

This method sends an update request to Yandex Cloud ML API to modify the assistant’s configuration. Only specified parameters will be updated, others remain unchanged.

Parameters

model (UndefinedOr[str | BaseGPTModel]) – New model URI or BaseGPTModel instance to use
temperature (UndefinedOr[float]) – A sampling temperature to use - higher values mean more random results. Should be a double number between 0 (inclusive) and 1 (inclusive).
max_tokens (UndefinedOr[int]) – Maximum number of tokens to generate
instruction (UndefinedOr[str]) – New instructions for the assistant
max_prompt_tokens (UndefinedOr[int]) – Maximum tokens allowed in the prompt
prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Strategy for truncating long prompts
name (UndefinedOr[str]) – New name for the assistant
description (UndefinedOr[str]) – New description for the assistant
labels (UndefinedOr[dict[str, str]]) – New key-value labels for the assistant
ttl_days (UndefinedOr[int]) – Time-to-live in days before automatic deletion
tools (UndefinedOr[Iterable[BaseTool]]) – Tools to use for completion. Can be a sequence or a single tool.
expiration_policy (UndefinedOr[ExpirationPolicyAlias]) – Policy for handling expiration
response_format (UndefinedOr[ResponseType]) – A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model. Read more about possible response formats in the structured output documentation_BaseAssistant_URL.
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Self

async delete(*, timeout=60)

Delete the assistant from Yandex Cloud ML.

Sends a delete request to the Yandex Cloud ML API to remove the assistant. After successful deletion, marks the assistant as deleted internally.

Parameters	timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.
Return type	None

async list_versions(page_size=Undefined, page_token=Undefined, timeout=60)

List all versions of the assistant.

This method retrieves historical versions of the assistant in a paginated manner.

Parameters

page_size (int | Undefined) – Maximum number of versions to return per page
page_token (str | Undefined) – Token for pagination
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncIterator[AssistantVersion]

async run(thread, *, custom_temperature=Undefined, custom_max_tokens=Undefined, custom_max_prompt_tokens=Undefined, custom_prompt_truncation_strategy=Undefined, custom_response_format=Undefined, timeout=60)

Execute a non-streaming run with the assistant on the given thread.

Parameters

thread (str | AsyncThread) – Thread ID or Thread object to run on
custom_temperature (UndefinedOr[float]) – Override for model temperature
custom_max_tokens (UndefinedOr[int]) – Override for max tokens to generate
custom_max_prompt_tokens (UndefinedOr[int]) – Override for max prompt tokens
custom_prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Override for prompt truncation strategy
custom_response_format (UndefinedOr[ResponseType]) – Override for response format
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncRun

async run_stream(thread, *, custom_temperature=Undefined, custom_max_tokens=Undefined, custom_max_prompt_tokens=Undefined, custom_prompt_truncation_strategy=Undefined, custom_response_format=Undefined, timeout=60)

Execute a streaming run with the assistant on the given thread.

Parameters

thread (str | AsyncThread) – Thread ID or Thread object to run on
custom_temperature (UndefinedOr[float]) – Override for model temperature
custom_max_tokens (UndefinedOr[int]) – Override for max tokens to generate
custom_max_prompt_tokens (UndefinedOr[int]) – Override for max prompt tokens
custom_prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Override for prompt truncation strategy
custom_response_format (UndefinedOr[ResponseType]) – Override for response format
timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

AsyncRun

property max_prompt_tokens: int | None

Returns the maximum number of prompt tokens allowed for the assistant.

name: str | None

The name of the assistant.

description: str | None

The description of the assistant.

created_by: str

The identifier of the user who created the assistant.

created_at: datetime

The timestamp when the assistant was created.

updated_by: str

The identifier of the user who last updated the assistant.

updated_at: datetime

The timestamp when the assistant was last updated.

expires_at: datetime

The timestamp when the assistant will expire.

labels: dict[str, str] | None

Additional labels associated with the assistant.

expiration_config: ExpirationConfig

Expiration configuration for the assistant.

model: BaseGPTModel

The GPT model used by the assistant.

instruction: str | None

Instructions or guidelines that the assistant should follow. These instructions guide the assistant’s behavior and responses.

prompt_truncation_options: PromptTruncationOptions

Options for truncating thread messages. Controls how messages are truncated when forming the prompt.

tools: tuple[BaseTool]... ,

Tools available to the assistant. Can be a sequence or a single tool. Tools must implement BaseTool interface.

response_format: ResponseType | None

A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model

id: str

Assistants domain

class yandexcloudmlsdk.assistants.domain.AsyncAssistantsclass yandex_cloud_ml_sdk._assistants.domain.AsyncAssistants

class yandexcloudmlsdk.assistants.assistant.AsyncAssistantclass yandex_cloud_ml_sdk._assistants.assistant.AsyncAssistant

Was the article helpful?

class yandex_cloud_ml_sdk._assistants.domain.AsyncAssistants

class yandex_cloud_ml_sdk._assistants.assistant.AsyncAssistant