Creating a data ingestion
Note
This feature is in the Preview stage.
-
In the management console
, select the resource folder you created the metadata catalog in. -
Select Yandex MetaData Hub.
-
In the left-hand panel, select
Data Catalog. -
In the list that opens, select the metadata catalog where you want to create an ingestion.
-
In the left-hand panel, select
Ingestions. -
Click Create ingestion.
-
Specify the ingestion settings:
-
In the Name field, enter a unique name for the ingestion.
-
Optionally, describe the ingestion.
-
Select a data source or create one.
-
Under PostgreSQL Ingestion Configuration:
-
Select an ingestion schedule:
-
Monthly: Select the dates and the ingestion start and end time.
-
Weekly: Select the days of the week and the ingestion start and end time.
Note
If scheduled for Monthly or Weekly, the ingestion will start at the specified time and stop as soon as new data has been ingested. If there are errors while ingesting, the ingestion will restart until the data has been ingested or until the specified time is over.
-
Daily: Select time intervals for ingestion.
-
Manually: For manual start only.
-
-
Optionally, under Data Filters, use regular expressions to specify which databases and database objects to include or exclude from the ingestion.
- Under Metadata Types, select the metadata types to extract from the source.
- Optionally, under Data Profiling:
- Select Enable Profiling to perform data profiling, i.e., analysis and collection of statistics on the data being extracted.
- Select Profile Table Level Only to skip data profiling in every table column. With this option on, data characteristics will only be collected for the table as a whole.
- In the Max Workers field, specify the number of computing threads for profiling.
- In the Sample Size field, specify the number of rows for sampling for column profiling. This setting applies when the Use Sampling option is enabled.
- In the Table Size Limit (GB) field, specify the table size in GB above which the table will be excluded from profiling.
- In the Table Row Count Limit field, specify the number of rows above which the table will be excluded from profiling.
- Specify which data characteristics to extract from the source:
- include_field_null_count: Number of
NULLrows per table or column. - include_field_distinct_count: Number of rows with different values per table or column.
- include_field_min_value: Minimum value per table or column.
- include_field_max_value: Maximum value per table or column.
- include_field_mean_value: Average value per table or column.
- include_field_median_value: Median value per table or column.
- include_field_stddev_value: Standard deviation per table or column.
- include_field_sample_values: Data slices, i.e., several consecutive values for each column.
- include_field_null_count: Number of
- Under Metadata Processing, select the image for metadata processing:
- Enable Use File Cache to improve ingestion performance.
-
-
-
Click Create.