Creating a search assistant that uses file and index metadata
The AI Assistant API feature is at the Preview stage.
AI Assistant API is a tool for creating AI assistants. It can be used to create personalized assistants, implement a generative response scenario with access to information from external sources (Retrieval Augmented Generation or RAG), and assign metadata sets to source files and search indexes for more efficient navigation through external sources.
Getting started
To use an example:
-
Create a service account and assign the
ai.assistants.editor
andai.languageModels.user
roles to it. -
Get the service account API key and save it.
The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
Use the pip
package manager to install the ML SDK library:pip install yandex-cloud-ml-sdk
Create an assistant
This example shows how to create an assistant that relies on information from files for responses. In the example, we will create a vector search index and a simplest form of chat. The search index and source files will get a set of metadata containing summarized information about them.
-
Download and unpack the archive
with examples of files that will be used as an additional source of information. The files contain advertising texts for tours to Bali and Kazakhstan generated by YandexGPT Pro. -
Create a file named
search-assistant.py
and paste the following code into it:import pathlib from yandex_cloud_ml_sdk import YCloudML from yandex_cloud_ml_sdk.search_indexes import ( VectorSearchIndexType, ) # Local path to the source files. mypath = "<path_to_files_with_examples>" # The `file_labels` variable contains metadata that will be assigned to the source files. # file_labels = [ {"bali": "File with the description of tours to Bali"}, {"kazakhstan": "File with the description of the proposal for Kazakhstan"}, ] # The `index_label` variable contains metadata that will be assigned to the search index. # index_label = { "promo": "The index contains general information about Bali and Kazakhstan", "visas": "The index contains information on visa policies for entry to Bali and Kazakhstan", } def main(): sdk = YCloudML( folder_id="<folder_ID>", auth="<API_key>", ) # Load the source files and assign them the metadata from the `file_labels` variable. # The files will be stored for five days. paths = pathlib.Path(mypath).iterdir() files = [] file_count = 0 for path in paths: file = sdk.files.upload( path, ttl_days=5, expiration_policy="static", name=str(path), labels=file_labels[file_count] ) files.append(file) file_count += 1 # Creating an index for vector search and assigning metadata to the index. operation = sdk.search_indexes.create_deferred( files, index_type=VectorSearchIndexType(), name="the-bali-and-kazakhstan-index", labels=index_label, ) # Waiting for the search index to be created. search_index = operation.wait() # Creating a tool to work with the search index. # Or even several indexes if that were the case. tool = sdk.tools.search_index(search_index) # Creating an assistant for the Latest YandexGPT Pro model. # It will use the search index tool. assistant = sdk.assistants.create("yandexgpt", tools=[tool]) thread = sdk.threads.create() input_text = input( 'Enter your question to the assistant ("exit" to end the dialog): ' ) while input_text.lower() != "exit": thread.write(input_text) # Giving the whole thread content to the model. run = assistant.run(thread) # To get the result, wait until the run is complete. result = run.wait() # Displaying the response on the screen. print(f"Answer: {result.text}") input_text = input( 'Enter your question to the assistant ("exit" to end the dialog): ' ) # Displaying some of the attributes of the _citations_ property: information about the used source files, their contents and metadata assigned, as well as information about the index and its metadata. # # # You can use the assigned metadata (labels) to apply additional filters to the resulting values. # print("Citations:") for citation in result.citations: for source in citation.sources: print(f" {source.text=}") print(f" {source.file.name=}") print(f" {source.file.labels=}") print(f" {source.search_index.name=}") print(f" {source.search_index.labels=}") # Deleting things you no longer need. search_index.delete() thread.delete() assistant.delete() for file in files: file.delete() if __name__ == "__main__": main()
Where:
mypath
: Variable containing the path to the directory containing the files you downloaded earlier, e.g.,/Users/myuser/tours-example/
.
-
<folder_ID>
: ID of the folder in which the service account was created. -
<API_key>
: Service account API key you got earlier required for authentication in the API.The following examples use API key authentication. Yandex Cloud ML SDK also supports IAM token and OAuth token authentication. For more information, see Authentication in Yandex Cloud ML SDK.
-
Run the created file:
python3 search-assistant.py
The example implements the simplest chat possible: enter your requests to the assistant from your keyboard and get answers. To end the dialog, enter
exit
.Approximate result
Enter your question to the assistant ("exit" to quit): How much is a visa to Bali? Answer: The cost of a visa to Bali is 300 rubles. Enter your question to the assistant ("exit" to quit): And how could someone get to Kazakhstan? Answer: To get to Kazakhstan from Russia, you need a passport that is valid for at least three months after the trip ends, a migration card (issued on the plane or at the border), and it is also recommended to have travel insurance. Enter your question to the assistant ("exit" to quit): exit Citations: source.text='**Казахстан: путешествие в сердце Евразии**\n\nОткройте для себя Казахстан — удивительную страну, где встречаются Восток и Запад. Здесь вы сможете насладиться бескрайними степями, величественными горами, историческими памятниками и гостеприимством местных жителей. **Что нужно для поездки?** Чтобы попасть в Казахстан из России, вам потребуются следующие документы:\n* Загранпаспорт, срок действия которого составляет не менее 3 месяцев на момент окончания поездки. * Миграционная карта (выдаётся в самолете или на границе). * Медицинская страховка (не обязательна, но рекомендуется). Не упустите возможность посетить эту прекрасную страну и получить массу положительных эмоций! Бронируйте свой отдых в Казахстане уже сегодня! **Мы ждём вас!**' source.file.name='/Users/myuser/tours-example/kazakhstan.md' source.file.labels={'kazakhstan': 'A file containing Kazakhstan offers'} source.search_index.name='the-bali-and-kazakhstan-index' source.search_index.labels={'promo': 'The index contains general info on Bali and Kazakhstan', 'visas': 'The index contains info on visa regulations upon entering Bali and Kazakhstan'} source.text='**Бали — райский уголок, где вас ждут незабываемые впечатления!**\n\nПриглашаем вас провести незабываемый отпуск на Бали! Этот волшебный остров в Индонезии славится своими прекрасными пляжами, уникальной культурой и гостеприимными жителями. Здесь вы сможете насладиться красотой природы, попробовать местную кухню и познакомиться с новыми людьми. **Что нужно для поездки?** Для въезда на территорию Индонезии вам потребуется виза. Вот список документов, которые необходимы для её оформления:\n* Загранпаспорт, срок действия которого составляет не менее 6 месяцев на момент въезда в страну. * Две фотографии, соответствующие требованиям консульства. * Подтверждение бронирования отеля или письмо другого жилья. * Бронь или билеты туда и обратно. * Анкета, заполненная на английском языке. Обратите внимание, что требования могут меняться, поэтому перед поездкой рекомендуется проверить актуальную информацию на сайте консульства или визового центра. Стоимость визы 300 рублей. Не упустите возможность посетить этот прекрасный остров и получить массу положительных эмоций! Бронируйте свой отдых на Бали уже сегодня! **Мы ждём вас!**' source.file.name='/Users/myuser/tours-example/bali.md' source.file.labels={'bali': 'A file containing Bali tour descriptions'} source.search_index.name='the-bali-and-kazakhstan-index' source.search_index.labels={'promo': 'The index contains general info on Bali and Kazakhstan', 'visas': 'The index contains info on visa regulations upon entering Bali and Kazakhstan'}
In the
run.text
property, the AI assistant returned the model-generated response based on the uploaded knowledge base. Therun.citations
property contains source citations, i.e., information about the knowledge base files and search indexes used to generate the response, including source file (citation.sources.file.labels
property) and index (citation.sources.search_index.labels
property) metadata.
See also
- Creating a simple assistant
- Creating an assistant with a search index
- Creating an AI assistant with search through PDF files with complex formatting
- Examples of working with ML SDK on GitHub