Creating an AI assistant for RAG with source file and index metadata preserved
Note
We do not recommend using AI Assistant API in new projects. To create AI agents, use the Responses API.
AI Assistant API is a AI Studio tool for creating AI assistants. It can be used to create personalized assistants and implement a retrieval augmented generation (RAG
The Retrieval tool allows AI assistants to draw information from the knowledge base.
Getting started
To use an example:
- Create a service account and assign the
ai.assistants.editorandai.languageModels.userroles to it. -
Get and save the service account's API key with
yc.ai.foundationModels.executefor its scope.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
Note
If you are using Windows
, we recommend installing the WSL shell first and using it to proceed. -
Install Python 3.10
or higher. -
Install Python venv
to create isolated virtual environments in Python. -
Create a new Python virtual environment and activate it:
python3 -m venv new-env source new-env/bin/activate -
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
Create an assistant
This example shows how to create an assistant that relies on information from files for responses. In the example, we will create a vector search index and a simplest form of chat. The search index and source files will get a set of metadata containing summarized information about them.
-
Download and unpack the archive
with examples of files that will be used as an additional source of information. The files contain advertising texts for tours to Bali and Kazakhstan generated by YandexGPT Pro. -
Create a file named
search-assistant.pyand paste the following code into it:import pathlib from yandex_cloud_ml_sdk import YCloudML from yandex_cloud_ml_sdk.search_indexes import ( VectorSearchIndexType, ) # Local path to the source files. mypath = "<path_to_files_with_examples>" # The `file_labels` variable contains metadata # that will be assigned to the source files. file_labels = [ {"bali": "File with the description of tours to Bali"}, {"kazakhstan": "File with the description of the proposal for Kazakhstan"}, ] # The `index_label` variable contains metadata # that will be assigned to the search index. index_label = { "promo": "The index contains general information about Bali and Kazakhstan", "visas": "The index contains information on visa policies for entry to Bali and Kazakhstan", } def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) # Load the source files and assign them the metadata from the `file_labels` variable. # The files will be stored for five days. paths = pathlib.Path(mypath).iterdir() files = [] file_count = 0 for path in paths: file = sdk.files.upload( path, ttl_days=5, expiration_policy="static", name=str(path), labels=file_labels[file_count] ) files.append(file) file_count += 1 # Creating an index for vector search and assigning metadata to the index. operation = sdk.search_indexes.create_deferred( files, index_type=VectorSearchIndexType(), name="the-bali-and-kazakhstan-index", labels=index_label, ) # Waiting for the search index to be created. search_index = operation.wait() # Creating a tool to work with the search index. # Or even several indexes if that were the case. tool = sdk.tools.search_index(search_index) # Creating an assistant for the Latest YandexGPT Pro model. # It will use the Vector Store tool. assistant = sdk.assistants.create("yandexgpt", tools=[tool]) thread = sdk.threads.create() input_text = input( 'Enter your question to the assistant ("exit" to end the dialog): ' ) while input_text.lower() != "exit": thread.write(input_text) # Giving the whole thread content to the model. run = assistant.run(thread) # To get the result, wait until the run is complete. result = run.wait() # Displaying the response on the screen. print(f"Answer: {result.text}") input_text = input( 'Enter your question to the assistant ("exit" to end the dialog): ' ) # Displaying some of the _citations_ property attributes: information # about the employed source files, their contents and metadata assigned, # as well as information about the index and its metadata. # You can use the assigned metadata (labels) to apply # additional filters to the resulting values. print("Citations:") for citation in result.citations: for source in citation.sources: print(f" {source.text=}") print(f" {source.file.name=}") print(f" {source.file.labels=}") print(f" {source.search_index.name=}") print(f" {source.search_index.labels=}") # Deleting things you no longer need. search_index.delete() thread.delete() assistant.delete() for file in files: file.delete() if __name__ == "__main__": main()Where:
mypath: Variable containing the path to the directory containing the files you downloaded earlier, e.g.,/Users/myuser/tours-example/.
-
<folder_ID>: ID of the folder in which the service account was created. -
<API_key>: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
Run the file you created:
python3 search-assistant.pyThe example implements the simplest chat possible: enter your requests to the assistant from your keyboard and get answers. To end the dialog, enter
exit.Approximate result
Enter your question to the assistant ("exit" to quit): How much is a visa to Bali? Answer: The cost of a visa to Bali is 300 rubles. Enter your question to the assistant ("exit" to quit): And how could someone get to Kazakhstan? Answer: To get to Kazakhstan from Russia, you need a passport that is valid for at least three months after the trip ends, a migration card (issued on the plane or at the border), and it is also recommended to have travel insurance. Enter your question to the assistant ("exit" to quit): exit Citations: source.text='**Казахстан: путешествие в сердце Евразии**\n\nОткройте для себя Казахстан — удивительную страну, где встречаются Восток и Запад. Здесь вы сможете насладиться бескрайними степями, величественными горами, историческими памятниками и гостеприимством местных жителей. **Что нужно для поездки?** Чтобы попасть в Казахстан из России, вам потребуются следующие документы:\n* Загранпаспорт, срок действия которого составляет не менее 3 месяцев на момент окончания поездки. * Миграционная карта (выдаётся в самолете или на границе). * Медицинская страховка (не обязательна, но рекомендуется). Не упустите возможность посетить эту прекрасную страну и получить массу положительных эмоций! Бронируйте свой отдых в Казахстане уже сегодня! **Мы ждём вас!**' source.file.name='/Users/myuser/tours-example/kazakhstan.md' source.file.labels={'kazakhstan': 'A file containing Kazakhstan offers'} source.search_index.name='the-bali-and-kazakhstan-index' source.search_index.labels={'promo': 'The index contains general info on Bali and Kazakhstan', 'visas': 'The index contains info on visa regulations upon entering Bali and Kazakhstan'} source.text='**Бали — райский уголок, где вас ждут незабываемые впечатления!**\n\nПриглашаем вас провести незабываемый отпуск на Бали! Этот волшебный остров в Индонезии славится своими прекрасными пляжами, уникальной культурой и гостеприимными жителями. Здесь вы сможете насладиться красотой природы, попробовать местную кухню и познакомиться с новыми людьми. **Что нужно для поездки?** Для въезда на территорию Индонезии вам потребуется виза. Вот список документов, которые необходимы для её оформления:\n* Загранпаспорт, срок действия которого составляет не менее 6 месяцев на момент въезда в страну. * Две фотографии, соответствующие требованиям консульства. * Подтверждение бронирования отеля или письмо другого жилья. * Бронь или билеты туда и обратно. * Анкета, заполненная на английском языке. Обратите внимание, что требования могут меняться, поэтому перед поездкой рекомендуется проверить актуальную информацию на сайте консульства или визового центра. Стоимость визы 300 рублей. Не упустите возможность посетить этот прекрасный остров и получить массу положительных эмоций! Бронируйте свой отдых на Бали уже сегодня! **Мы ждём вас!**' source.file.name='/Users/myuser/tours-example/bali.md' source.file.labels={'bali': 'A file containing Bali tour descriptions'} source.search_index.name='the-bali-and-kazakhstan-index' source.search_index.labels={'promo': 'The index contains general info on Bali and Kazakhstan', 'visas': 'The index contains info on visa regulations upon entering Bali and Kazakhstan'}In the
run.textproperty, the AI assistant returned the model-generated response based on the uploaded knowledge base. Therun.citationsproperty contains source citations, i.e., information about the knowledge base files and search indexes used to generate the response, including source file (citation.sources.file.labelsproperty) and index (citation.sources.search_index.labelsproperty) metadata.