Reading data from Iceberg tables
This section covers the basics of working with Iceberg tables.
To read data from an Iceberg table located in Yandex Object Storage, follow these steps:
- Create a connection containing your access credentials for an Iceberg catalog.
- Run a query against the required table in the catalog.
Query example for reading data from an Iceberg table:
SELECT * FROM iceberg_connection.my_table
Where:
iceberg_connection: Iceberg catalog connection name.my_table: Iceberg table name.
Setting up a connection
To create a connection to the Iceberg catalog:
-
In the management console
, select the folder where you want to create a connection. -
Navigate to Yandex Query.
-
In the left-hand panel, switch to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Iceberg catalog connection name.
- Type: Iceberg.
-
Under Connection type parameters:
-
Bucket auth: Select
PublicorPrivatedepending on the type of read access to objects in the bucket.For a public bucket, specify a name in the Bucket field.
For a private bucket:-
Select the Cloud and Folder where the data source is located.
-
Select an existing bucket or create a new one.
-
Select an existing service account or create a new one. Assign it the
storage.viewerrole required to access the data.To use a service account, the
iam.serviceAccounts.userrole is required.
-
-
Directory: Directory containing the Hadoop directory inside the selected bucket.
-
-
-
Click Create.
Query syntax
Iceberg uses the following SQL syntax:
SELECT * FROM <connection>.<table_name>
Where:
<connection>: Catalog connection name.<table_name>: Iceberg table name.
Limitations
Working with Iceberg tables comes with certain limitations.
- You can only query tables that were created as per version 1
of the Iceberg specification. - You can only read tables from the Hadoop directory located in Yandex Object Storage.
- Table time travel, i.e., reading previous table snapshots, is not supported.
Supported data types
The list of supported Iceberg data types and their corresponding YQL types.
| Iceberg data type | Yandex Query data type |
|---|---|
boolean |
Bool |
int |
Int32 |
long |
Int64 |
float |
Float |
double |
Double |
date |
Date |
time |
Utf8 |
timestamp |
Utf8 |
string |
Utf8 |
binary |
String |
Pruning
A query against an Iceberg table may contain filters built using a WHERE expression. These filters reduce the volume of data for processing. Such a reduction procedure is called pruning.
Pruning is performed both at the read planning stage and at the reading stage.
At the read planning stage, the system:
- Reads the Iceberg table metadata.
- Uses metadata statistics to determine the list of data files to be read.
- Provides the selected files for reading.
At the reading stage, the system:
- Splits data files into row groups.
- Reads row group statistics.
- Uses the statistics to determine which row groups need to be read.
- Reads data from the selected groups.