Data formats and compression algorithms
Below are the data formats and compression algorithms supported in Yandex Query.
Supported data formats
Yandex Query Language supports the following data formats:
Csv_with_names
This format is based on CSV
Sample data:
Year,Manufacturer,Model,Price
1997,Ford,E350,3000.00
1999,Chevy,"Venture «Extended Edition»",4900.00
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=csv_with_names,
SCHEMA
(
Year int,
Manufacturer string,
Model string,
Price double
)
)
Query results:
| # | Manufacturer | Model | Price | Year |
|---|---|---|---|---|
| 1 | Ford | E350 | 3000 | 1997 |
| 2 | Chevy | Venture «Extended Edition» | 4900 | 1999 |
Tsv_with_names
This format is based on TSV0x9 code) and stored in columns with the first file line containing column names.
Sample data:
Year Manufacturer Model Price
1997 Ford E350 3000.00
1999 Chevy "Venture «Extended Edition»" 4900.00
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=tsv_with_names,
SCHEMA
(
Year int,
Manufacturer string,
Model string,
Price double
)
)
Query results:
| # | Manufacturer | Model | Price | Year |
|---|---|---|---|---|
| 1 | Ford | E350 | 3000 | 1997 |
| 2 | Chevy | Venture «Extended Edition» | 4900 | 1999 |
Json_list
This format is based on a JSON representation
Example of correct data (represented as a list of JSON objects):
[
{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 },
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
]
Example of INCORRECT data (each line contains a separate object in JSON format, but these objects are not represented as a list):
{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 }
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
Json_each_row
This format is based on a JSON representation
Example of correct data (each line contains a separate object in JSON format, but these objects are not represented as a list):
{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 },
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=json_each_row,
SCHEMA
(
Year int,
Manufacturer string,
Model string,
Price double
)
)
Query results:
| # | Manufacturer | Model | Price | Year |
|---|---|---|---|---|
| 1 | Ford | E350 | 3000 | 1997 |
| 2 | Chevy | Venture «Extended Edition» | 4900 | 1999 |
Raw
This format allows reading raw data as is. The data read this way can be processed using YQL
Use this format if the built-in features for parsing source data in Yandex Query are insufficient.
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=raw,
SCHEMA
(
Data String
)
)
Query results:
Year,Manufacturer,Model,Price
1997,Ford,E350,3000.00
1999,Chevy,\"Venture «Extended Edition»\",4900.00
Json_as_string
This format is based on a JSON representation
In this format, each file should contain:
- Object in a valid JSON representation in each file line.
- List of objects in a valid JSON representation.
Example of correct data (represented as a list of JSON objects):
{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 }
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=json_as_string,
SCHEMA
(
Data Json
)
)
Query results:
| # | Data |
|---|---|
| 1 | {"Manufacturer": "Ford", "Model": "E350", "Price": 3000, "Year": 1997} |
| 2 | {"Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900, "Year": 1999} |
Parquet
This format allows you to read the contents of a file in Apache Parquet
Data compression algorithms supported in Parquet files:
- No compression.
- SNAPPY
- GZIP
- LZO
- BROTLI
- LZ4
- ZSTD
- LZ4_RAW
Request example
SELECT
*
FROM <connection>.<path>
WITH
(
format=parquet,
SCHEMA
(
Year int,
Manufacturer string,
Model string,
Price double
)
)
Query results:
| # | Manufacturer | Model | Price | Year |
|---|---|---|---|---|
| 1 | Ford | E350 | 3000 | 1997 |
| 2 | Chevy | Venture «Extended Edition» | 4900 | 1999 |
Example of reading data
Sample query for reading data from Yandex Object Storage.
SELECT
*
FROM
connection.`folder/filename.csv`
WITH(
format='csv_with_names',
SCHEMA
(
Year int,
Manufacturer String,
Model String,
Price Double
)
);
Where:
| Field | Description |
|---|---|
connection |
Yandex Object Storage connection name |
folder/filename.csv |
Path to the file in the Yandex Object Storage bucket |
SCHEMA |
Data schema description in the file |
Supported compression algorithms
Reads
Yandex Query supports the following compression algorithms for data reads:
| Compression format | Name in Query |
|---|---|
| Gzip |
gzip |
| Zstd |
zstd |
| LZ4 |
lz4 |
| Brotli |
brotli |
| Bzip2 |
bzip2 |
| Xz |
xz |
The parquet file format supports its own internal compression algorithms. Yandex Query enables reading data in parquet format using the following compression algorithms:
| Compression format | Name in Query |
|---|---|
| Raw |
raw |
| Snappy |
snappy |
Writing data to Yandex Object Storage
Currently, the following data write formats are supported:
| Data format | Name in Query |
|---|---|
| CSV |
csv_with_names |
| Parquet |
parquet |
Query supports the following compression algorithms for data writes:
| Compression format | Name in Query |
|---|---|
| Gzip |
gzip |
| Zstd |
zstd |
| LZ4 |
lz4 |
| Brotli |
brotli |
| Bzip2 |
bzip2 |
| Xz |
xz |
Parquet file format supports its own internal compression algorithms. Query allows writing data in parquet format using the following compression algorithms:
| Compression format | Name in Query |
|---|---|
| Snappy |
No name, by default |
Writing data to Yandex Data Streams
Data Streams only lets you write data as a byte stream that is interpreted on the receiving side.
File format and compression algorithm settings for data writes in Data Streams are not applied.