Transferring data to a Apache Iceberg™ target endpoint
Yandex Data Transfer enables you to migrate data to Apache Iceberg™ tables in a Apache Hive™ Metastore cluster and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform the required operations with the tables and see how the transfer is going.
Scenarios for transferring data to Apache Iceberg™
For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
- ClickHouse®
- Greenplum®
- MongoDB
- MySQL®
- PostgreSQL
- Elasticsearch
- Yandex Object Storage
- Oracle
- Managed Service for YDB
- YTsaurus
For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.
Configuring the Apache Iceberg™ target endpoint
When creating or updating an endpoint, you can define:
- Settings for connecting to a Apache Hive™ Metastore cluster.
- Configuration settings for an Yandex Object Storage bucket or custom S3-compatible storage.
- Additional parameters.
Apache Hive™ Metastore cluster
Warning
To create or edit an endpoint of a managed database, you will need the managed-metastore.viewer role or the primitive viewer role for the folder the cluster of this managed database resides in.
Connection with the cluster specified in Yandex Cloud.
-
Apache Hive™ Metastore cluster: ID of the cluster whose folder is used for Apache Iceberg™ tables.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic. This will allow you to apply the specified security group rules to the VMs and clusters in the selected network without changing their settings. For more information, see Networking in Yandex Data Transfer.
Make sure the selected security groups are configured.
Bucket configurations
- Bucket: Name of the bucket to upload source data to.
- Service account: Select or create a service account with the
storage.uploaderrole that Data Transfer will use to connect to the bucket.
- (Optional) Endpoint: Endpoint for an Amazon S3-compatible service. Leave this field empty to use Amazon.
- Region: Region to send requests.
- Bucket: Bucket name.
- Access Key ID and Secret Access Key: ID and contents of the AWS key
used to access a private bucket.
- Path prefix: Path prefix for writing objects to the bucket. This is optional.
Additional settings
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
DISABLED: Use the existing tables to write new data. -
DROP: Remove all tables involved in the transfer.Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
-
After configuring the data source and target, create and start the transfer.