Updating a data ingestion
Note
This feature is in the Preview stage.
-
In the management console
, select the resource folder you created the metadata catalog in. -
Select Yandex MetaData Hub.
-
In the left-hand panel, select
Data Catalog. -
In the list that opens, select the metadata catalog you want to update an ingestion in.
-
In the left-hand panel, select
Ingestions. -
In the list of ingestions, click
in the line with the ingestion and select Edit. -
Edit its settings:
-
In the Name field, specify a new unique name for the ingestion.
-
Optionally, edit the ingestion description.
-
Select a new data source.
-
Under PostgreSQL Ingestion Configuration:
-
Update the ingestion schedule:
-
Monthly: Select the dates and the ingestion start and end time.
-
Weekly: Select the days of the week and the ingestion start and end time.
Note
If scheduled for Monthly or Weekly, the ingestion will start at the specified time and stop as soon as new data has been ingested. If there are errors while ingesting, the ingestion will restart until the data has been ingested or until the specified time is over.
-
Daily: Select time intervals for ingestion.
-
Manually: For manual start only.
-
-
Optionally, under Data Filters, use regular expressions to specify which databases and database objects to include in or exclude from the ingestion.
- Under Metadata Types, select the metadata types to extract from the source.
- Optionally, under Data Profiling:
- Select Enable Profiling to perform data profiling, i.e., analysis and collection of statistics on the data being extracted.
- Select Profile Table Level Only to skip data profiling in every table column. With this option on, data characteristics will only be collected for the table as a whole.
- In the Max Workers field, specify the number of computing threads for profiling.
- In the Sample Size field, specify the number of rows for sampling for column profiling. This setting applies when the Use Sampling option is enabled.
- In the Table Size Limit (GB) field, specify the table size in GB above which the table will be excluded from profiling.
- In the Table Row Count Limit field, specify the number of rows above which the table will be excluded from profiling.
- Specify which data characteristics to extract from the source:
- include_field_null_count: Number of
NULLrows per table or column. - include_field_distinct_count: Number of rows with different values per table or column.
- include_field_min_value: Minimum value per table or column.
- include_field_max_value: Maximum value per table or column.
- include_field_mean_value: Average value per table or column.
- include_field_median_value: Median value per table or column.
- include_field_stddev_value: Standard deviation per table or column.
- include_field_sample_values: Data slices, i.e., several consecutive values for each column.
- include_field_null_count: Number of
- Under Metadata Processing, select the image for metadata processing:
- Enable Use File Cache to improve ingestion performance.
-
-
-
Click Apply.