Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex MetaData Hub
  • About Yandex MetaData Hub
      • Integrating the AI data analyst with Data Catalog
    • Service roles for access management
  • Access management
  • Quotas and limits
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder and network
  • Create a service account
  • Prepare the metadata catalog
  • Create a metadata catalog
  • Create a metadata source
  • Create and start a data ingestion
  • Connect an external MCP server
  • Connecting in AI Studio
  • Connecting to an external AI agent
  • Test a conversation with the agent
  1. Data Catalog
  2. Tutorials
  3. Integrating the AI data analyst with Data Catalog

Integrating AI Studio with Data Catalog

Written by
Sergey Kanunnikov
Improved by
Danila N.
Updated at March 5, 2026
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder and network
    • Create a service account
  • Prepare the metadata catalog
    • Create a metadata catalog
    • Create a metadata source
    • Create and start a data ingestion
  • Connect an external MCP server
    • Connecting in AI Studio
    • Connecting to an external AI agent
  • Test a conversation with the agent

You can use an AI assistant to search and analyze patterns in metadata catalogs deployed in Data Catalog. To do that, you need to connect the Data Catalog MCP server to MCP Hub. The server allows you to request the list of metadata catalogs, search through metadata, and obtain its lineage graph at the table and column level for use in the context of conversation with agents.

To set up integration with Data Catalog in AI Studio:

  1. Set up your infrastructure.
  2. Prepare the metadata catalog.
  3. Connect an external MCP server.
  4. Test a conversation with the agent.

Getting startedGetting started

Sign up for Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or create a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure.

Learn more about clouds and folders here.

Required paid resourcesRequired paid resources

The integration infrastructure cost includes a fee for Agent Atelier based on the number of tokens in request and response (see Yandex AI Studio pricing). You start paying for the agent as soon as you activate it.

Set up your infrastructureSet up your infrastructure

Create a folder and networkCreate a folder and network

Create a resource folder to host your metadata catalog.

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Select Create a default network. This will create a network with subnets in each availability zone.
  4. Click Create.

Learn more about clouds and folders.

Create a service accountCreate a service account

Management console
  1. Navigate to data-folder.

  2. In the list of services, select Identity and Access Management.

  3. Click Create service account.

  4. Name the service account, e.g., sa-for-mcp-server.

  5. Click Add role and assign the following roles to the service account:

    • data-catalog.user for access to the metadata catalog resources.
    • serverless.mcpGateways.invoker for access to the MCP server in MCP Hub.
    • serverless.mcpGateways.anonymousInvoker for access to the external MCP server.
  6. Click Create.

Prepare the metadata catalogPrepare the metadata catalog

Create a metadata catalogCreate a metadata catalog

Management console
  1. In the management console, select the resource folder where you want to create a metadata catalog.
  2. Select Yandex MetaData Hub.
  3. In the left-hand panel, select Data Catalog.
  4. Click Creating a catalog.
  5. In the Name field, enter the catalog name, test-sales.
  6. Click Create.

Note

When you create a metadata catalog, the metadata AI markup is on by default.

With this option enabled, the AI assistant suggests descriptions, domains, classifications and tags, glossaries and terms, and marks up your metadata using them. You can confirm, edit, or reject any suggestion your AI assistant makes by hovering over the AI icon next to the suggestion and selecting the action.

After the catalog is created, you can manage the AI markup on the Overview page or when updating the catalog.

Create a metadata sourceCreate a metadata source

Management console
  1. In the left-hand panel, select Data sources.

  2. Click Create data source.

  3. Specify test-sales-source as the source name.

  4. Select the type of the backend that will supply metadata for analysis. Once the source is created, you cannot change the database type. Available backends:

    • PostgreSQL
    • MySQL®
    • ClickHouse®
    • Yandex Data Transfer
    • WebSQL
    • Yandex StoreDoc/MongoDB
    • OpenSearch
    • Greenplum®
  5. Specify the source parameters for the selected database type:

    • Connection ID: Managed connection ID in Yandex Connection Manager.
    • Database name: Name of the database to ingest metadata from.
  6. Click Create.

Create and start a data ingestionCreate and start a data ingestion

Management console
  1. In the left-hand panel, select Ingestions.

  2. Click Create ingestion.

  3. Specify the ingestion settings:

    • In the Name field, enter load-sales as the ingestion name.

    • Select the metadata source you created earlier.

    • Specify the ingestion configuration for the data source:

      • Select Manually for the ingestion schedule.
      • Optionally, under Data Filters, use regular expressions to specify which databases and database objects to include in or exclude from the ingestion.
      • Under Metadata Types, select the metadata types to extract from the source.

      • Optionally, under Data Profiling:

        • Select Enable Profiling to perform data profiling, i.e., analysis and collection of statistics on the data being extracted.
        • Select Table level only to skip data profiling in every table column. With this option on, data characteristics will only be collected for the table as a whole.
        • In the Max Workers field, specify the number of computing threads for profiling.
        • In the Sample Size field, specify the number of rows for sampling for column profiling. This setting applies when the Use Sampling option is enabled.
        • In the Table size limit field, specify the table size in GB above which the table will be excluded from profiling.
        • In the Table row limit field, specify the number of rows above which the table will be excluded from profiling.
        • Select Enable field null count to get the number of rows with NULL for each column.
        • Select Enable distinct value count to get the number of unique values for each column.
        • Select Enable field min value to get the minimum value for each numeric column.
        • Select Enable field max value to get the maximum value for each numeric column.
        • Select Enable field mean value to get the mean value for each numeric column.
        • Select Enable field median value to get the median value for each numeric column.
        • Select Enables field value stddev to get the standard deviation value for each numeric column.
        • Select Enables field quintiles to get quantiles for each numeric column.
        • Select Enable distinct value frequency count to get the frequency of unique values for each column.
        • Select Enable field histogram to get a histogram for each numeric column.
        • Select Enable field sample values to get sample values for each column.
        • Select Enable query joining to dynamically combine SQL queries for faster profiling.
        • In the Limit field, specify the maximum number of rows to profile. If set to 0, all rows will be profiled.
      • Under Metadata Processing, select the image for metadata processing:

        • Enable Use File Cache to improve ingestion performance.
  4. Click Create.

  5. In the list of ingestions, click in the line with your new ingestion and select Start.

    During ingestion, the AI assistant will automatically mark up the data. Once successfully completed, the ingestion will get the Success status.

  6. To view ingested and marked-up data, select  Metadata search in the left-hand panel.

    The page displays the info about the data, i.e., data source, database, and tables.

    Note

    The AI assistant automatically creates entities for metadata markup (domains, glossaries, tags, classifications, and terms) and their descriptions. You can confirm, edit, or reject the markup suggested by your AI assistant by hovering over the AI icon next to the suggestion and selecting the action.

Connect an external MCP serverConnect an external MCP server

Connecting in AI StudioConnecting in AI Studio

Management console
  1. Navigate to data-folder.

  2. Select AI Studio.

  3. In the left-hand panel, select MCP servers and click Create MCP server. In the window that opens:

    1. Under Add method, select Connect.

    2. Under Tools, click Add tools. In the window that opens, configure the MCP server connection:

      • Transport: Streamable HTTP.

      • URL: https://datacatalog-consumer.mcp.cloud.yandex.net/mcp

      • Authorization type: Access token.

      • Under Authorization header, set the Value field to Bearer <IAM_token>. To do it, get an IAM token for the service account created earlier, then paste it into the field.

        Note

        The IAM token lifetime does not exceed 12 hours; however, we recommend requesting a token more often, e.g., every hour.

    3. Click Connect.

    4. In the Add tools window that opens, select all tools and click Add.

    5. Under Server parameters:

      1. In the Name field, enter a name for the new MCP server. Follow these naming requirements:

        • Length: between 3 and 63 characters.
        • It can only contain lowercase Latin letters, numbers, and hyphens.
        • It must start with a letter and cannot end with a hyphen.
      2. Optionally, add a description and labels for the server you are creating by using the corresponding buttons.

      3. In the Access field, select Private.
      4. In the Service account field, select the service account you previously created.
      5. Optionally, turn on the Enable logging option and configure the logging settings to keep a log of the MCP server you are creating.

    6. Click Save.

  4. In the left-hand panel, select Agents and click Create agent.

  5. Specify the agent settings:

    • Name: Agent name.
    • Model: Language model.
    • Under Instructions, select a ready-made system instruction template for the agent or describe how the agent should behave and what it should do.
    • Under Tools:
      • Click Add and select Add MCP.
      • In the list, select the MCP server you created earlier and click Select.
      • In the Default behavior for all tools field, select Confirmation not needed.
      • Click Create and continue.

Connecting to an external AI agentConnecting to an external AI agent

  1. Get an IAM token for the service account you created earlier.

    Note

    The IAM token lifetime does not exceed 12 hours; however, we recommend requesting a token more often, e.g., every hour.

  2. Specify the Data Catalog MCP server configuration for your agent:

    {
      "mcpServers": {
        "yandex-cloud-datacatalog-consumer": {
          "type": "streamableHttp",
          "url": "https://datacatalog-consumer.mcp.cloud.yandex.net/mcp",
          "headers": {
            "Authorization": "Bearer <IAM_token>"
          }
        }
      }
    }
    

Test a conversation with the agentTest a conversation with the agent

Tip

If using the agent in AI Studio, do the testing in the right-hand Agent testing panel.

  1. Start a conversation with the agent by specifying the data catalog ID as shown below:

    Use the marked-up data in the apah36iavgh5******** data catalog.
    
  2. Use the examples of prompts to respond to which the agent will be analyzing the marked-up data from Data Catalog. It is assumed that the data contains sales-related information:

    • Write an SQL query to generate YoY sales analytics
    • Find all tables with user payment information
    • Which tables are marked as containing sensitive data?
    • Where does the customer_transactions table get its data from?
    • Help find the tables needed to calculate the user retention metric
    • Where can I find the website users' behavior data?
    • Which data should I use to analyze sales funnel conversion rate?
    • Show all dependencies of the transactions table to see how schema changes affect it

Was the article helpful?

Previous
Data storage markup
Next
Metadata catalogs
© 2026 Direct Cursus Technology L.L.C.