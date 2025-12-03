Switching from the AI Assistant API to Responses API
- Differences between the AI Assistant API and Responses API
- How to migrate a simple text assistant to the Responses API
- How to migrate an assistant with tools to the Responses API
The AI Assistant API allowed creating AI assistants that stored the context of communication with the user in threads, were able to use Retrieval tools and WebSearch, and get intermediate responses from the model.
For new and current projects, we recommend using the Responses API, which is a simple and flexible interface that allows saving the dialog context. The Responses API has built-in tools for file and web search, offers its own feature set and integration with external tools via MCP servers for overall high performance.
Warning
Starting December 10, 2025, Yandex Cloud AI Studio's AI Assistant API functionality will no longer be supported and will be completely discontinued from January 26, 2026. You should migrate all your current projects to the Responses API before January 26, 2026.
Follow this guide to convert your existing AI assistants built with the AI Assistant API into Responses API-based AI agents.
A Responses API AI agent is a model instance with a specified configuration: an instruction, configured tools, and communication context. An AI agent determines the model's behavior and how it communicates with users and other systems.
Differences between the AI Assistant API and Responses API
The concepts and tools used in the AI Assistant API and Responses API are different:
|AI Assistant API
|Responses API
|Assistant: AI assistant as a resource.
|When using the API, no separate resource is created; all settings are provided directly in the
responses.create() method. In the management console, you can save the AI agent configuration with a unique ID and then use it in the Responses API.
|Thread: Dialog thread.
|There are no threads that contain the context of all messages. You can provide the conversation history as the context of the new
response call in the
previous_response_id field.
|Run: Run of the AI assistant for a thread.
|A
response object is the result of the
responses.create() method. Each
response object is similar to a
Run in the AI Assistant API which contains the final response.
|Retrieval: Tool for search through indexes.
|A built-in
file_search tool. To run a search, specify an array of Vector Store indexes.
|WebSearch: Web search tool.
|A built-in
web_search tool. You can specify search domain and region.
|Streaming: Getting intermediate model responses.
|
client.responses.stream() method.
Conceptual differences
Main conceptual differences between the Responses API and AI Assistant API:
-
In the Responses API, there are no assistants as separate AI Studio resources.
AI Assistant API
Responses API
In the AI Assistant API, you need to create an AI assistant just once. With that done, you can run it in different threads.
In the Responses API, you need to specify the following for each request:
- Model (
model)
- Instructions (
instructions)
- Tools to use
tools
- Model properties (
temperature,
max_output_tokens, etc.)
To adapt your code to the Responses API, use one of the following options to save your model settings:
- Export the configuration of your AI assistant from the AI Assistant API into the your application code.
- Specify and save the model configuration under Agent Atelier in the management console. Once you do that, you will be able to use it in your application code by specifying the ID of the saved agent in the request.
- Model (
-
Context is transmitted in messages (in the
previous_response_idfield), not in threads.
AI Assistant API
Responses API
The AI Assistant API stores context in threads (
thread), and each run (
run) reads it over again.
The Responses API implements a mechanism allowing you to transmit the ID of the previous message in the
previous_response_idfield to keep track of the message history.
Note
Message retention period is limited to 30 days from when they are created by the
responses.create()method.
-
The tools the Responses API comes with are built in and require no extra libraries.
AI Assistant API
Responses API
The Retrieval and WebSearch components you find in the AI Assistant API are configured globally as assistant tools and use external sources and separate search indexes.
In the Responses API, the file and web search scenarios are implemented via the
toolsfield where you can customize your tool set for each request. This field may take the following values:
{"type": "file_search"}
{"type": "web_search"}
-
How to migrate a simple text assistant to the Responses API
AI assistant workflow via the AI Assistant API
In the AI Assistant API, you go through these steps to use an assistant:
- Creating an AI assistant to store model settings, tools, and basic instructions.
- Creating a thread (dialog container).
- Creating a message in the thread (user message).
- Running the assistant to process the thread.
- Status polling the run pending its completion.
- Getting a message from the thread (model response).
AI agent workflow via the Responses API
In the Responses API, an AI agent is a set of parameters in the code, whereas the context of the previous dialog is provided via the
previous_response_id field.
The logic of your application must preserve the
response.id identifier, same as a thread in the AI Assistant API. To get a response based on the conversation history, provide the ID of the last message (
response.id) in the
previous_response_id field with each next user message.
Here is an example of how a simple Responses API-based AI agent works:
from openai import OpenAI
YANDEX_CLOUD_FOLDER = "<folder_ID>"
YANDEX_CLOUD_MODEL = "<model_URI>"
YANDEX_CLOUD_API_KEY = "<service_account_API_key>"
# or YANDEX_CLOUD_IAM_TOKEN = "<IAM_token>"
previous_id = None # saving the ID of the last assistant response
client = OpenAI(
api_key=YANDEX_CLOUD_API_KEY,
project=YANDEX_CLOUD_FOLDER,
base_url="https://rest-assistant.api.cloud.yandex.net/v1",
)
print("Agent chat (to exit, type ‘exit')\n")
while True:
user_input = input("You: ")
if user_input.lower() in ("exit", "quit"):
print("Chat session ended.")
break
response = client.responses.create(
model=f"gpt://{YANDEX_CLOUD_FOLDER}/{YANDEX_CLOUD_MODEL}",
input=[{"role": "user", "content": user_input}],
instructions="You are a text agent that maintains a conversation and provides meaningful responses to the user’s questions.",
previous_response_id=previous_id, # providing context, if any
)
# saving the ID for the next step
previous_id = response.id
# outputting the agent's response
print("Agent:", response.output_text)
How to migrate an assistant with tools to the Responses API
The process of migrating an AI assistant to the Responses API depends on what tools are connected and how you get the generation results.
RAG scenarios with Retrieval
In file and internal knowledge base search scenarios, you use the AI Assistant API search indexes and the Retrieval tool: the AI assistant generates responses based on documents uploaded to the indexes and returns the metadata of the files it used.
In the AI Assistant API, the Retrieval tool was linked to the assistant:
# First, a tool is created to work with the existing search index.
tool = sdk.tools.search_index(
search_index,
call_strategy={
"type": "function",
"function": {"name": "guide", "instruction": instruction},
},
)
# Next, an assistant is created which is going to use that tool.
assistant = sdk.assistants.create(
"yandexgpt",
instruction="You are an internal corporate documentation assistant. Answer politely. If the information is not in the documents below, don't make up your answer.",
tools=[tool],
)
thread = sdk.threads.create()
To migrate an AI assistant with the Retrieval tool connected, follow these steps:
- Upload all documents of the connected search index to the vector store used by the Responses API.
- When putting together a request in your application, add the
file_searchtool settings:
import openai
import json
YANDEX_CLOUD_FOLDER = "<folder_ID>"
YANDEX_CLOUD_MODEL = "<model_URI>"
VECTOR_STORE_ID = "<Vector_Store_instance_ID>"
YANDEX_CLOUD_API_KEY = "<service_account_API_key>"
# or YANDEX_CLOUD_IAM_TOKEN = "<IAM_token>"
client = openai.OpenAI(
api_key=YANDEX_CLOUD_API_KEY,
base_url="https://rest-assistant.api.cloud.yandex.net/v1",
project=YANDEX_CLOUD_FOLDER,
)
response = client.responses.create(
model=f"gpt://{YANDEX_CLOUD_FOLDER}/{YANDEX_CLOUD_MODEL}",
instructions="You are a smart assistant. If asked about ..., search through the index for information",
tools=[
{
"type": "file_search",
"vector_store_ids": [VECTOR_STORE_ID],
}
],
input="what is ...",
)
print("Response text:")
print(response.output_text)
print("\n" + "=" * 50 + "\n")
# Full response
print("Full response (JSON):")
print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
Web search scenarios
In the AI Assistant API, you used to configure the
WebSearch tool when creating your AI assistant:
{
"folderId": "<folder_ID>",
"modelUri": "gpt://<folder_ID>/yandexgpt-lite/latest",
"instruction": "You are a smart assistant designed for a finance company. Answer politely. Use search to answer the questions. Do not make up your answer.",
"tools": [
{
"genSearch": {
"options": {
"site": {
"site": [
"https://cbr.ru/",
"https://yandex.ru/finance/currencies"
]
},
"enableNrfmDocs": true
},
"description": "Tool to get information about official currency exchange rates."
}
}
]
}
In the Responses API, the
web_search tool settings are provided directly in the request.
To migrate an AI assistant with
WebSearch, provide the
file_search tool settings in the request:
import openai
import json
YANDEX_CLOUD_FOLDER = "<folder_ID>"
YANDEX_CLOUD_MODEL = "<model_URI>"
YANDEX_CLOUD_API_KEY = "<service_account_API_key>"
# or YANDEX_CLOUD_IAM_TOKEN = "<IAM_token>"
client = openai.OpenAI(
api_key=YANDEX_CLOUD_API_KEY,
base_url="https://rest-assistant.api.cloud.yandex.net/v1",
project=YANDEX_CLOUD_FOLDER,
)
response = client.responses.create(
model=f"gpt://{YANDEX_CLOUD_FOLDER}/{YANDEX_CLOUD_MODEL}",
input="Create a brief overview of the latest LLM news in 2025, only facts, no specualtion.",
# Providing tool setting
tools=[
{
"type": "web_search",
"filters": {
"allowed_domains": [
"habr.ru",
],
"user_location": {
"region": "213",
}
}
}
],
temperature=0.3,
max_output_tokens=1000,
)
Getting intermediate response generation results
The AI Assistant API allowed access to intermediate response generation results. For example, in ML SDK, the
run_stream() method was used:
run = assistant.run_stream(thread)
# Intermediate results you are getting as the model is generating its response
for event in run:
print(event._message.parts)
# All fields of the final result
print(f"run {event=}")
The Responses API also allows getting intermediate generation results, e.g., via the
responses.stream() method:
import openai
YANDEX_CLOUD_FOLDER = "<folder_ID>"
YANDEX_CLOUD_MODEL = "<model_URI>"
YANDEX_CLOUD_API_KEY = "<service_account_API_key>"
# or YANDEX_CLOUD_IAM_TOKEN = "<IAM_token>"
client = openai.OpenAI(
api_key=YANDEX_CLOUD_API_KEY,
base_url="https://rest-assistant.api.cloud.yandex.net/v1",
project=YANDEX_CLOUD_FOLDER,
)
# Creating a streaming request
with client.responses.stream(
model=f"gpt://{YANDEX_CLOUD_FOLDER}/{YANDEX_CLOUD_MODEL}",
input="Write a short friendly and funny birthday toast.",
) as stream:
for event in stream:
# Text response deltas
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
# Event showing that the response is completed
# elif event.type == "response.completed":
# print("\n---\nResponse completed")
# You can get the full text of the response if you need to
# final_response = stream.get_final_response()
# print("\nFull response text:\n", final_response.output_text)