Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex AI Studio
    • About Yandex AI Studio
    • Yandex Workflows
    • Quotas and limits
    • Terms and definitions
  • Compatibility with OpenAI
    • About Yandex Cloud ML SDK
      • Overview
        • Overview
        • Models domain
        • Assistants domain
        • Tools domain
        • Files domain
        • Threads domain
        • Runs domain
        • Search indexes domain
        • Search API domain
        • Datasets domain
        • Tuning domain
        • Batch domain
        • Chat domain
      • Authentication
      • Request retries
  • Access management
  • Pricing policy
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • class yandex_cloud_ml_sdk._assistants.domain.Assistants
  • class yandex_cloud_ml_sdk._assistants.assistant.Assistant
  1. Yandex Cloud ML SDK
  2. SDK reference
  3. SDK
  4. Assistants domain

Assistants domain

Written by
Yandex Cloud
Updated at September 25, 2025
  • class yandex_cloud_ml_sdk._assistants.domain.Assistants
  • class yandex_cloud_ml_sdk._assistants.assistant.Assistant

class yandexcloudmlsdk.assistants.domain.Assistantsclass yandex_cloud_ml_sdk._assistants.domain.Assistants

Base class for assistants management.

Provides common functionality for creating, getting and listing assistants.

create(model, *, temperature=Undefined, max_tokens=Undefined, instruction=Undefined, max_prompt_tokens=Undefined, prompt_truncation_strategy=Undefined, name=Undefined, description=Undefined, labels=Undefined, ttl_days=Undefined, tools=Undefined, expiration_policy=Undefined, response_format=Undefined, timeout=60)

Create a new assistant instance.

Parameters

  • model (str | BaseGPTModel) – Model ID or BaseGPTModel instance
  • temperature (UndefinedOr[float]) – A sampling temperature to use - higher values mean more random results. Should be a double number between 0 (inclusive) and 1 (inclusive).
  • max_tokens (UndefinedOr[int]) – Maximum number of tokens to generate
  • instruction (UndefinedOr[str]) – System instruction for the assistant
  • max_prompt_tokens (UndefinedOr[int]) – Maximum tokens allowed in prompt
  • prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Strategy for prompt truncation
  • name (UndefinedOr[str]) – Assistant name
  • description (UndefinedOr[str]) – Assistant description
  • labels (UndefinedOr[dict[str, str]]) – Additional labels associated with the assistant
  • ttl_days (UndefinedOr[int]) – Time-to-live in days
  • tools (UndefinedOr[Iterable[BaseTool]]) – Tools to use for completion. Can be a sequence or a single tool.
  • expiration_policy (UndefinedOr[ExpirationPolicyAlias]) – Expiration policy for assistant
  • response_format (UndefinedOr[ResponseType]) – A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model. Read more about possible response formats in the structured output documentation_BaseAssistants_Domain_URL.
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Assistant

get(assistant_id, *, timeout=60)

Get an existing assistant by ID.

Parameters

  • assistant_id (str) – ID of the assistant to retrieve
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Assistant

list(*, page_size=Undefined, timeout=60)

List all assistants.

Parameters

  • page_size (int | Undefined) – Number of assistants per page
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Iterator[Assistant]

class yandexcloudmlsdk.assistants.assistant.Assistantclass yandex_cloud_ml_sdk._assistants.assistant.Assistant

Base class providing read-only access to Yandex Cloud ML Assistant configuration and metadata.

This class implements the core interface for interacting with Yandex Cloud ML Assistant API in a read-only manner. It serves as the parent class for both synchronous (Assistant) and asynchronous (AsyncAssistant) implementations.

update(*, model=Undefined, temperature=Undefined, max_tokens=Undefined, instruction=Undefined, max_prompt_tokens=Undefined, prompt_truncation_strategy=Undefined, name=Undefined, description=Undefined, labels=Undefined, ttl_days=Undefined, tools=Undefined, expiration_policy=Undefined, response_format=Undefined, timeout=60)

Update the assistant’s configuration with new parameters.

This method sends an update request to Yandex Cloud ML API to modify the assistant’s configuration. Only specified parameters will be updated, others remain unchanged.

Parameters

  • model (UndefinedOr[str | BaseGPTModel]) – New model URI or BaseGPTModel instance to use
  • temperature (UndefinedOr[float]) – A sampling temperature to use - higher values mean more random results. Should be a double number between 0 (inclusive) and 1 (inclusive).
  • max_tokens (UndefinedOr[int]) – Maximum number of tokens to generate
  • instruction (UndefinedOr[str]) – New instructions for the assistant
  • max_prompt_tokens (UndefinedOr[int]) – Maximum tokens allowed in the prompt
  • prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Strategy for truncating long prompts
  • name (UndefinedOr[str]) – New name for the assistant
  • description (UndefinedOr[str]) – New description for the assistant
  • labels (UndefinedOr[dict[str, str]]) – New key-value labels for the assistant
  • ttl_days (UndefinedOr[int]) – Time-to-live in days before automatic deletion
  • tools (UndefinedOr[Iterable[BaseTool]]) – Tools to use for completion. Can be a sequence or a single tool.
  • expiration_policy (UndefinedOr[ExpirationPolicyAlias]) – Policy for handling expiration
  • response_format (UndefinedOr[ResponseType]) – A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model. Read more about possible response formats in the structured output documentation_BaseAssistant_URL.
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Self

delete(*, timeout=60)

Delete the assistant from Yandex Cloud ML.

Sends a delete request to the Yandex Cloud ML API to remove the assistant. After successful deletion, marks the assistant as deleted internally.

Parameters

timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

None

list_versions(page_size=Undefined, page_token=Undefined, timeout=60)

List all versions of the assistant.

This method retrieves historical versions of the assistant in a paginated manner.

Parameters

  • page_size (int | Undefined) – Maximum number of versions to return per page
  • page_token (str | Undefined) – Token for pagination
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Iterator[AssistantVersion]

run(thread, *, custom_temperature=Undefined, custom_max_tokens=Undefined, custom_max_prompt_tokens=Undefined, custom_prompt_truncation_strategy=Undefined, custom_response_format=Undefined, timeout=60)

Execute a non-streaming run with the assistant on the given thread.

Parameters

  • thread (str | Thread) – Thread ID or Thread object to run on
  • custom_temperature (UndefinedOr[float]) – Override for model temperature
  • custom_max_tokens (UndefinedOr[int]) – Override for max tokens to generate
  • custom_max_prompt_tokens (UndefinedOr[int]) – Override for max prompt tokens
  • custom_prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Override for prompt truncation strategy
  • custom_response_format (UndefinedOr[ResponseType]) – Override for response format
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Run

run_stream(thread, *, custom_temperature=Undefined, custom_max_tokens=Undefined, custom_max_prompt_tokens=Undefined, custom_prompt_truncation_strategy=Undefined, custom_response_format=Undefined, timeout=60)

Execute a streaming run with the assistant on the given thread.

Parameters

  • thread (str | Thread) – Thread ID or Thread object to run on
  • custom_temperature (UndefinedOr[float]) – Override for model temperature
  • custom_max_tokens (UndefinedOr[int]) – Override for max tokens to generate
  • custom_max_prompt_tokens (UndefinedOr[int]) – Override for max prompt tokens
  • custom_prompt_truncation_strategy (UndefinedOr[PromptTruncationStrategyType]) – Override for prompt truncation strategy
  • custom_response_format (UndefinedOr[ResponseType]) – Override for response format
  • timeout (float) – The timeout, or the maximum time to wait for the request to complete in seconds. Defaults to 60 seconds.

Return type

Run

property max_prompt_tokens: int | None

Returns the maximum number of prompt tokens allowed for the assistant.

name: str | None

The name of the assistant.

description: str | None

The description of the assistant.

created_by: str

The identifier of the user who created the assistant.

created_at: datetime

The timestamp when the assistant was created.

updated_by: str

The identifier of the user who last updated the assistant.

updated_at: datetime

The timestamp when the assistant was last updated.

expires_at: datetime

The timestamp when the assistant will expire.

labels: dict[str, str] | None

Additional labels associated with the assistant.

expiration_config: ExpirationConfig

Expiration configuration for the assistant.

model: BaseGPTModel

The GPT model used by the assistant.

instruction: str | None

Instructions or guidelines that the assistant should follow. These instructions guide the assistant’s behavior and responses.

prompt_truncation_options: PromptTruncationOptions

Options for truncating thread messages. Controls how messages are truncated when forming the prompt.

tools: tuple[BaseTool]... ,

Tools available to the assistant. Can be a sequence or a single tool. Tools must implement BaseTool interface.

response_format: ResponseType | None

A format of the response returned by the model. Could be a JsonSchema, a JSON string, or a pydantic model

id: str

Was the article helpful?

Previous
Models domain
Next
Tools domain
© 2025 Direct Cursus Technology L.L.C.