Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex MetaData Hub
  • About Yandex MetaData Hub
      • Metadata catalogs
    • Service roles for access management
  • Access management
  • Quotas and limits
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Uploading metadata
  • Metadata markup
  • Domains and subdomains
  • Classifications and tags
  • Glossaries and terms
  • Use cases
  1. Data Catalog
  2. Concepts
  3. Metadata catalogs

Metadata catalog

Written by
Yandex Cloud
Updated at November 13, 2025
  • Uploading metadata
  • Metadata markup
    • Domains and subdomains
    • Classifications and tags
    • Glossaries and terms
  • Use cases

Note

This feature is in the Preview stage.

Data Catalog allows you to collect, analyze, and mark up metadata from various sources. You can upload structural metadata, e.g., list of tables in a managed database cluster, their schemas, and links between tables.

You can use Data Catalog to:

  • Collect, store, and organize metadata.
  • Find a dashboard with relevant business indicators.
  • Analyze and interpret business indicators.
  • Find data for your business needs.
  • Find information sources behind a particular object.
  • Find data owners, including passive ownership through subscription.
  • Build a schema for data consumer.

Being integrated with Yandex WebSQL, Data Catalog allows you to send SQL queries to databases in clusters acting as data sources for Data Catalog.

The main entity in Data Catalog is a metadata catalog. A catalog serves as:

  • Hub for collecting and storing metadata from various sources.
  • Workspace for marking up metadata.

You can upload metadata into a catalog using sources and ingestions. Metadata resides in internal storage.

At the very basic level, you can use domains and subdomains, e.g., to arrange metadata by company departments. For a more complex markup, use these resources:

  • Classifications and tags
  • Glossaries and terms

Uploading metadataUploading metadata

To upload metadata, use sources and ingestions.

A source is a connection through which the metadata is uploaded. This connection stores information about the database or service metadata is ingested from as well as authentication data. Learn more about available backends.

A source can connect to both clusters of managed databases in Yandex Cloud and to custom installations of these databases. It can also fetch object links based on ongoing data delivery in Yandex Data Transfer.

If you create multiple sources for the same DB instance or transfer in one catalog, the single data store object associated with this DB instance will be automatically created. This object aggregates all uploaded metadata across all sources of this DB instance or cluster.

An ingestion is a process that connects to the data storage or service specified in the source and uploads its metadata into the catalog. In an ingestion, you can configure:

  • Filters to get only relevant metadata.
  • Profiling to export the statistical data you need.

An ingestion is exclusively associated with a specific source. However, a source can have multiple associated ingestions. This means you can create multiple ingestions for one source, each with its own filters.

You can run an ingestion manually or configure it to run on a schedule. A scheduled ingestion always runs only once, even if the schedule specifies a period and not a specific hour.

Data Catalog has quotas for the maximum number of sources and ingestions in a catalog.

Metadata markupMetadata markup

Domains and subdomainsDomains and subdomains

A domain represents a group of metadata. You can use domains to arrange metadata to meet your business process needs, e.g., by departments or business units. For each domain, you can create a subdomain for more granular grouping.

You can only assign one domain or subdomain to each metadata set or its individual element. At the same time, you can assign different domains or subdomains to separate elements within a single metadata set.

Data Catalog has quotas for the maximum number of domains in a catalog. The maximum domain nesting depth is 5.

Classifications and tagsClassifications and tags

A classification comprises tags used to mark up metadata.

Data Catalog has quotas for the maximum number of classifications in a catalog.

Tags are labels used to mark up data based on its type, e.g., sensitive data, table specifications, etc. You can assign multiple tags from a single or different classifications to the same set of metadata or its individual element. If Mutually exclusive is enabled in a classification, you can only assign one tag from this classification to a metadata set or its element.

In addition to metadata sets and their elements, you can assign tags to:

  • Domains and subdomains
  • Glossaries
  • Individual terms in a glossary

Data Catalog has quotas for the maximum number of tags in a classification.

Glossaries and termsGlossaries and terms

A glossary is a dictionary of domain-specific terms and their definitions. Glossaries act as a single source of truth for terminology used within a company. Depending on how broadly a glossary is used, it may belong to one of these types:

  • Domain-specific: Includes terms relevant to a specific industry or business domain.
  • Project-related: Includes terms relevant to a particular project or multiple related projects.
  • Corporate: Includes terms relevant to all company’s projects and business areas.

Data Catalog has quotas for the maximum number of glossaries in a catalog.

Terms are used to label data based on how a business defines certain concepts, such as revenue, expenses, etc. For each term, you can specify a synonym or create a child term for more granular data markup. You can assign multiple terms to the same metadata set or its individual element. These may include:

  • Terms from different glossaries
  • Child terms of different parent terms

Data Catalog has quotas for the maximum number of terms in a glossary. The maximum nesting depth of a term is 5.

Use casesUse cases

  • Creating a term in a glossary.
  • Creating a child term.
  • Updating a glossary.
  • Updating a term.
  • Creating a tag in a classification.
  • Updating a classification.
  • Updating a tag in a classification.

Was the article helpful?

Previous
Queries against datasets with Yandex WebSQL
Next
Service roles for access management
© 2025 Direct Cursus Technology L.L.C.