Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Foundation Models
    • About Yandex Foundation Models
    • Multimodal models
    • Embeddings
    • Datasets
    • Fine-tuning
      • Overview
      • Search indexes
    • Quotas and limits
  • Yandex Cloud ML SDK
  • Compatibility with OpenAI
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Assistant components
  • Handling external information sources
  • Source verification
  1. Concepts
  2. AI Assistant API
  3. Overview

AI Assistant API

Written by
Yandex Cloud
Updated at April 24, 2025
  • Assistant components
    • Handling external information sources
    • Source verification

The AI Assistant API feature is at the Preview stage.

AI Assistant API is a tool for creating AI assistants. It can be used to create personalized assistants, implement a generative response scenario with access to information from external sources (known as retrieval augmented generation, or RAG), and save the model's request context.

You can create your AI assistant using the Yandex Cloud ML SDK or through API requests in a programming language.

To use AI Assistant API in Yandex Foundation Models, you need the ai.assistants.editor and ai.languageModels.user roles or higher for the folder.

Assistant components

AI Assistant API offers a number of abstractions for building a custom chatbot or AI assistant.

Assistant determines which model to use and what parameters and instructions to apply. This enables you to configure the model just once and use those settings in the future without needing to provide them every time.

Threads are used to maintain the historical context of user communication. Each user chat makes an individual thread. By running your assistant for a specific thread, you call the model and provide it with all the context stored in the thread. Listen the current run for intermediate generation results; the final response, once generated, will become part of the thread.

Tip

By default, each time the model starts running, it will reprocess the content of the thread. If a thread holds some large context and you start the assistant after each user message, running it can grow rather expensive. To optimize costs, consider limiting the size of the context to provide: set the customPromptTruncationOptions parameter when starting your assistant.

For detailed costs of running an assistant, see Assistant pricing policy.

Handling external information sources

For your model to use external information sources to respond to requests, upload supplementary data files through the Files API and create a search index for them. You can upload up to 10,000 files with the maximum size of 128 MB per file. A single file can be included in multiple search indexes at the same time.

For all AI Assistant API limitations, see Quotas and limits in Yandex Foundation Models.

The upload feature supports the following MIME types:

  • application/json
  • application/msword
  • application/pdf
  • application/vnd.ms-excel
  • application/vnd.ms-excel.sheet.2
  • application/vnd.ms-excel.sheet.3
  • application/vnd.ms-excel.sheet.4
  • application/vnd.ms-excel.workspace.3
  • application/vnd.ms-excel.workspace.4
  • application/vnd.ms-outlook
  • application/vnd.ms-powerpoint
  • application/vnd.ms-project
  • application/vnd.ms-word2006ml
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • application/vnd.openxmlformats-officedocument.wordprocessingml.document
  • application/x-latex
  • application/x-ms-owner
  • application/xhtml+xml
  • text/csv
  • text/html
  • text/markdown
  • text/plain
  • text/xml
  • application/rtf

Markdown is the optimal search index source as many models are trained on this format and are more likely to interpret it correctly. Use the docling Python library to convert files, even those with complex formatting, into Markdown. To learn more, see Creating an AI assistant with search through PDF files with complex formatting.

Note

All uploaded files and search indexes are subject to expiration. When uploading a file, use the ExpirationConfig parameter to configure the expiration period. By default, a file not used for seven days is deleted.

When you create a search index, you define the type of search it will support. Full-text, vector, and hybrid search types are supported. Indexing may take from a few seconds to several hours depending on the file type and size as well as service load. The files are indexed asynchronously. The response to the request to create a serach index includes the operation ID. You can use it to find out when the search index will be ready.

Once a search index is created, you can configure an assistant to utilize it. In this case, the model will consider the contents of that search index and will primarily use information from it to generate responses.

Source verification

If the AI assistant uses search indexes with external information sources when generating a response, the model's answer contains source citations, i.e., the citations section with details about all the indexes, external files, and fragments used for the response.

When using ML SDK, source citations are available in the citations property of the run object. To get the source citations via the API, use the run ID and initiate the Run.Get REST API method or the RunService/Get gRPC API call. The section with source citations is also included in all assistant messages stored in the thread.

Structure of the citations section

The citations section that features links to the sources has the following structure:

  • sources: Array consisting of one or more fragments of source files that were used to generate the response:

    • chunk: Information on the file fragment that was used to generate the response:

      • searchIndex: Section of fields with information about the search index that includes the source file fragment used. This section contains the ID, type, metadata (labels), and other information about the index and its settings.
      • sourceFile: Section of fields with information about the source file, whose fragment was used to generate the response. This section contains the ID, metadata (labels), and other information about the source file.
      • content: Section of fields with the fragment text that was used to generate the response.

To learn more about the citations section structure, see the API reference.

Source file and search index metadata

For more efficient usage of source confirmation and data returned in the citations section, you can specify metadata for each source file and search index. You can use metadata to apply additional filters to results or to give more detailed and meaningful names and descriptions to the sources.

Source file metadata is specified when creating the files using the File.Create REST API method, FileService/Create gRPC API call, or Yandex Cloud ML SDK. Source file metadata is provided as objects containing <key>:<value> pairs.

Search index metadata is specified when creating the indexes using the SearchIndex.Create REST API method, SearchIndexService/Create gRPC API call, or Yandex Cloud ML SDK. Search index metadata is provided as objects containing <key>:<value> pairs.

The metadata of each source file and search index may comprise one or more <key>:<value> pairs.

For more information on using source file and search index metadata, see Creating a search assistant that uses file and index metadata.

Tip

Once the search index is created, you can delete the files. However, if you do this, the information about the sources will be lost, and the source section will be returned empty. To keep source citations, do not delete the files used to build the search index.

For examples of how to work with source citations using the SDK and API, and for output examples, see Creating an assistant with a search index.

See also

  • Creating a simple assistant
  • Creating an assistant with a search index
  • Creating a search assistant that uses file and index metadata
  • Creating an assistant with intermediate response generation results

Was the article helpful?

Previous
Fine-tuning
Next
Search indexes
Yandex project
© 2025 Yandex.Cloud LLC