Getting started as a metadata steward
Note
This feature is at the Preview stage.
When using Yandex Data Catalog as a metadata steward, you can collect and organize metadata on Yandex Cloud objects and links between them.
Data Catalog can collect metadata from the following services:
- PostgreSQL
- MySQL®
- ClickHouse®
- Yandex Data Transfer
- WebSQL
- Yandex StoreDoc/ MongoDB
- OpenSearch
- Greenplum®
To get started:
- Create a metadata catalog.
- Create a metadata source.
- Create data ingestions from sources on a schedule.
- Test the ingested metadata.
- Create a classification and tags.
- Create a domain and subdomains.
- Create a glossary and terms.
- Mark up the obtained data.
Required paid resources
The cost of infrastructure support includes a fee for Yandex Managed Service for PostgreSQL cluster computing resources, storage volume, and backups (see Managed Service for PostgreSQL pricing).
Getting started
-
Navigate to the management console
and log in to Yandex Cloud or sign up if not signed up yet. -
If you do not have a resource folder, create one:
-
In the management console
, select the appropriate cloud from the list on the left. -
At the top right, click Create folder.
-
Give your folder a name. The naming requirements are as follows:
- It must be from 2 to 63 characters long.
- It can only contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
-
Optionally, specify the description for your folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, within which all network traffic will be allowed.
-
Click Create.
-
-
Assign your Yandex Cloud account the following roles for the resource folder:
data-catalog.dataSteward: To create and manage Data Catalog resources.- vpc.user: To use the cluster network.
Note
If you are unable to manage roles, contact your cloud or organization administrator.
-
Create a Managed Service for PostgreSQL cluster to supply data from as a test load.
Create a metadata catalog
- In the management console
, select the resource folder you prepared earlier. - Select Yandex MetaData Hub.
- In the left-hand panel, select
Data Catalog. - Click Create catalog.
- Set a Name for the metadata catalog.
- Set a Description for the metadata catalog.
- Click Create.
Note
When you create a metadata catalog, the metadata AI markup is on by default.
With this option enabled, the AI assistant suggests descriptions, domains, classifications and tags, glossaries and terms, and marks up your metadata using them. You can confirm, edit, or reject any suggestion your AI assistant makes by hovering over the AI icon next to the suggestion and selecting the action.
After the catalog is created, you can manage the AI markup on the Overview page or when updating the catalog.
Create a metadata source and ingestion
-
In the management console
, navigate to the metadata catalog you created earlier. -
Go to the
Data sources tab and click Create data source. -
Set a Name for the source.
-
Set a Description for the source.
-
Select Database type: PostgreSQL
-
Under PostgreSQL source, set the following parameters:
- Folder ID: Resource folder the Managed Service for PostgreSQL cluster was created in.
- Installation type: Cluster Managed Service for PostgreSQL.
- Cluster for Managed DB: Managed Service for PostgreSQL cluster you created earlier.
- Connection ID: Connection to the previously created Managed Service for PostgreSQL cluster in Yandex Connection Manager.
- Database name in the Managed Service for PostgreSQL cluster you created earlier.
- Upload from all databases: Optionally, enable this setting if you need to upload data from all databases.
- Network ID: Specify the network ID.
-
Click Create.
-
This will open a page with a list of sources in the metadata catalog. You will see No ingestion in the line with the source you just created.
-
Hover over this message and click Create ingestion in the window that opens.
-
Set a Name for the ingestion.
-
Set a Description for the ingestion.
-
In the Sched field, select Daily.
-
In the Start time and End time fields, specify the ingestion operation time.
-
Click Create.
Test the ingested metadata
-
In the management console
, navigate to the metadata catalog you created earlier. -
Navigate to the
Metadata search tab.In the window that opens, you will see the metadata you got from the Managed Service for PostgreSQL cluster created earlier.
Tip
The metadata ingested from the source will also appear on the Uploaded data tab in the same source.
Create a classification and tags
Create a classification
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Tags and classifications tab and click Create classification. - Set a Name for the classification.
- Set a Description for the classification.
- Click Create.
Create tags
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Tags and classifications tab and open the previously created classification. - Click Create tag.
- In the window that opens, set a Name for the tag.
- Set a Description for the tag.
- Click Create.
Create a domain and subdomains
Create a domain
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Domains tab and click Create domain. - Set a Name for the domain.
- Set a Description for the domain.
- Add one or more previously created tags.
- Click Create.
Create a subdomain
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Domains tab and select the previously created domain. - Click Add subdomain.
- Set a Name for the subdomain.
- Set a Description for the subdomain.
- Add one or more previously created tags.
- Click Create.
Create a glossary and terms
Create a glossary
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Terms and glossaries tab and click Create glossary. - Set a Name for the glossary.
- Set a Description for the glossary.
- Add one or more previously created tags.
- Click Create.
Create terms
- In the management console
, navigate to the metadata catalog you created earlier. - Go to the
Terms and glossaries tab and select the previously created glossary. - Click Create term.
- Set a Name for the term.
- Set a Description for the term.
- Specify synonyms for the term.
- Add one or more previously created tags.
- Add linked terms.
- Click Create.
Mark up the obtained data
- In the management console
, navigate to the metadata catalog you created earlier. - Navigate to the
Metadata search tab. - Click
next to the dataset you selected and select Set domain, Edit tags, or Edit terms. - In the window that opens, select an object in the hierarchy of domains, tags, or terms. Use search, if required.
- Add the selected objects.
What's next
- Create a term in a glossary.
- Create a child term.
- Update a glossary.
- Edit a term in a glossary.
- Create a tag in a classification.
- Update a classification.
- Edit a tag in a classification.