Reading data from Iceberg tables
This section provides basic information about using Iceberg tables.
To read data from an Iceberg table located in Yandex Object Storage, follow these steps:
- Create a connection containing the details for connection to an Iceberg folder.
- Run a query to the table of interest from the folder.
Example of query for reading data from an Iceberg table:
SELECT * FROM iceberg_connection.my_table
Where:
iceberg_connection: Name of the your new connection to the Iceberg folder.my_table: Name of your table in the Iceberg folder.
Setting up a connection
To create a connection with an Iceberg folder:
-
In the management console
, select the folder where you want to create a connection. -
Go to Yandex Query.
-
In the left-hand panel, go to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Name of your connection with the Iceberg folder.
- Type: Iceberg.
-
Under Connection type parameters:
-
Bucket auth: Select
PublicorPrivatedepending on the type of the bucket object read permissions.For a public bucket, enter a name in the Bucket field.
For a private bucket:-
Select the Cloud and Folder where the data source is located.
-
Select a bucket or create a new one.
-
Select or create a service account with the
storage.viewerrole you will use to access the data.To use a service account, the
iam.serviceAccounts.userrole is required.
-
-
Directory: Directory with the Hadoop folder in the selected bucket.
-
-
-
Click Create.
Query syntax
To work with Iceberg tables, the following SQL query form is used:
SELECT * FROM <connection>.<table_name>
Where:
<connection>: Name of the new connection with the folder.<table_name>: Iceberg table name from the folder.
Limitations
Iceberg tables are subject to some limitations.
- You can only query tables created as per version 1
of the Iceberg specification. - You can only read tables from the Hadoop folder located in Yandex Object Storage.
- You cannot read previous table states (snapshots) (time travel).
Supported data types
List of supported Iceberg data types and corresponding YQL types.
| Data type Iceberg | Data type Yandex Query |
|---|---|
boolean |
Bool |
int |
Int32 |
long |
Int64 |
float |
Float |
double |
Double |
date |
Date |
time |
Utf8 |
timestamp |
Utf8 |
string |
Utf8 |
binary |
String |
Significant data pruning
A query to the Iceberg table may contain filters built based on the WHERE expression. These filters are used to reduce the amount of data that needs to be processed. This reduction procedure is called data pruning.
Pruning is done both when planning the read operation and when reading.
When planning:
- Reading the Iceberg table metadata.
- Listing the data files to read (based on statistics from metadata).
- Providing the files selected for reading.
When reading:
- Splitting data files into row groups.
- Reading statistics for row groups.
- Listing the groups to read (based on statistics).
- Reading data from selected groups.