Integrating AI Studio with Data Catalog

Written by

Sergey Kanunnikov

Improved by

Danila N.

Updated at June 26, 2026

View in Markdown

Getting started
- Required paid resources
Set up your infrastructure
- Create a folder and network
- Create a service account
Prepare the metadata catalog
Connect an external MCP server
- Connecting in AI Studio
- Connecting to an external AI agent
Test a conversation with the agent

You can use an AI assistant to search and analyze patterns in metadata catalogs deployed in Data Catalog. To do that, you need to connect the Data Catalog MCP server to MCP Hub. The server allows you to request the list of metadata catalogs, search through metadata, and obtain its lineage graph at the table and column level for use in the context of conversation with agents.

To set up integration with Data Catalog in AI Studio:

Getting started

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page.

Learn more about clouds and folders here.

Required paid resources

The integration infrastructure cost includes a fee for Agent Atelier based on the number of tokens in request and response (see Yandex Cloud AI Studio pricing). You start paying for the agent as soon as you activate it.

Set up your infrastructure

Create a folder and network

Create a resource folder to host your metadata catalog.

Management console

In the management console, select a cloud and click Create folder.
Name your folder, e.g., data-folder.
Select Create a default network. This will create a network with subnets in each availability zone.
Click Create.

Learn more about clouds and folders.

Create a service account

Management console

Navigate to data-folder.
Navigate to Identity and Access Management.
Click Create service account.
Name the service account, e.g., sa-for-mcp-server.
Click Add role and assign the following roles to the service account:
- data-catalog.user for access to the metadata catalog resources.
- serverless.mcpGateways.invoker for access to the MCP server in MCP Hub.
- serverless.mcpGateways.anonymousInvoker for access to the external MCP server.
Click Create.

Prepare the metadata catalog

Create a metadata catalog

Management console

In the management console, select the resource folder where you want to create a metadata catalog.
Navigate to Yandex MetaData Hub.
In the left-hand panel, select Data Catalog.
Click Creating a catalog.
In the Name field, enter the catalog name: test-sales.
Click Create.

Note

When you create a metadata catalog, the metadata AI markup is on by default.

With this option enabled, the AI assistant suggests descriptions, domains, classifications and tags, glossaries and terms, and marks up your metadata using them. You can confirm, edit, or reject any suggestion your AI assistant makes by hovering over the AI icon next to the suggestion and selecting the action.

After the catalog is created, you can manage the AI markup on the Overview page or when updating the catalog.

Create a metadata source

Management console

In the left-hand panel, select Data sources.
Click Create data source.
Specify test-sales-source as the source name.
Select the type of the backend that will supply metadata for analysis. Once the source is created, you cannot change the database type. Available backends:
- PostgreSQL
- MySQL®
- ClickHouse®
- Yandex StoreDoc/MongoDB
- OpenSearch
- Greenplum®
- Yandex Data Transfer
- WebSQL
- DataLens
Specify the source parameters for the selected database type:
- Connection ID: Managed connection ID in Yandex Connection Manager.
- Database name: Name of the database to ingest metadata from.
Click Create.

Create and start a data ingestion

Management console

In the left-hand panel, select Ingestions.
Click Create ingestion.
Specify the ingestion settings:
- In the Name field, enter load-sales as the ingestion name.
- Select the metadata source you created earlier.
- Specify the ingestion configuration for the data source:
  - Select Manually for the ingestion schedule.
  - Optionally, under Data Filters, use regular expressions to specify which databases and database objects to include in or exclude from the ingestion.
  - Under Metadata Types, select the metadata types to extract from the source.
  - Optionally, under Data Profiling:
    - Select Enable Profiling to perform data profiling, i.e., analysis and collection of statistics on the data being extracted.
    - Select Table level only to skip data profiling in every table column. With this option on, data characteristics will only be collected for the table as a whole.
    - In the Max Workers field, specify the number of computing threads for profiling.
    - In the Sample Size field, specify the number of rows for sampling for column profiling. This setting applies when the Use Sampling option is enabled.
    - In the Table size limit field, specify the table size in GB above which the table will be excluded from profiling.
    - In the Table row limit field, specify the number of rows above which the table will be excluded from profiling.
    - Select Enable field null count to get the number of rows with NULL for each column.
    - Select Enable distinct value count to get the number of unique values for each column.
    - Select Enable field min value to get the minimum value for each numeric column.
    - Select Enable field max value to get the maximum value for each numeric column.
    - Select Enable field mean value to get the mean value for each numeric column.
    - Select Enable field median value to get the median value for each numeric column.
    - Select Enables field value stddev to get the standard deviation value for each numeric column.
    - Select Enables field quintiles to get quantiles for each numeric column.
    - Select Enable distinct value frequency count to get the frequency of unique values for each column.
    - Select Enable field histogram to get a histogram for each numeric column.
    - Select Enable field sample values to get sample values for each column.
    - Select Enable query joining to dynamically combine SQL queries for faster profiling.
    - In the Limit field, specify the maximum number of rows to profile. If set to 0, all rows will be profiled.
  - Under Metadata Processing, select the image for metadata processing:
    - Enable Use File Cache to improve ingestion performance.
Click Create.
In the list of ingestions, click in the line with your new ingestion and select Start.

During ingestion, the AI assistant will automatically mark up the data. Once successfully completed, the ingestion will get the Success status.
To view ingested and marked-up data, select Metadata search in the left-hand panel.

The page displays the info about the data, i.e., data source, database, and tables.

Note

The AI assistant automatically creates entities for metadata markup (domains, glossaries, tags, classifications, and terms) and their descriptions. You can confirm, edit, or reject the markup suggested by your AI assistant by hovering over the AI icon next to the suggestion and selecting the action.

Integrating AI Studio with Data Catalog

Getting startedGetting started

Required paid resourcesRequired paid resources

Set up your infrastructureSet up your infrastructure

Create a folder and networkCreate a folder and network

Create a service accountCreate a service account

Prepare the metadata catalogPrepare the metadata catalog

Create a metadata catalogCreate a metadata catalog

Create a metadata sourceCreate a metadata source

Create and start a data ingestionCreate and start a data ingestion

Connect an external MCP serverConnect an external MCP server

Connecting in AI StudioConnecting in AI Studio

Connecting to an external AI agentConnecting to an external AI agent

Test a conversation with the agentTest a conversation with the agent

Was the article helpful?

Getting started

Required paid resources

Set up your infrastructure

Create a folder and network

Create a service account

Prepare the metadata catalog

Create a metadata catalog

Create a metadata source

Create and start a data ingestion

Connect an external MCP server

Connecting in AI Studio

Connecting to an external AI agent

Test a conversation with the agent