Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Query
    • Data formats and compression algorithms
    • Working with Managed Service for ClickHouse® databases
    • Working with Managed Service for Greenplum® databases
    • Working with Managed Service for MySQL® databases
    • Working with Managed Service for PostgreSQL databases
    • Working with Managed Service for YDB databases
    • Writing metrics to Yandex Monitoring
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Example of reading data
  • Supported compression algorithms
  • Reads
  • Writing data to Yandex Object Storage
  • Writing data to Yandex Data Streams
  1. Data sources and targets
  2. Data formats and compression algorithms

Data formats and compression algorithms

Written by
Yandex Cloud
Improved by
Max Z.
Updated at November 14, 2024
  • Example of reading data
  • Supported compression algorithms
    • Reads
    • Writing data to Yandex Object Storage
    • Writing data to Yandex Data Streams

Below are the data formats and compression algorithms supported in Yandex Query.

Supported data formatsSupported data formats

Yandex Query Language supports the following data formats:

  • csv_with_names.
  • tsv_with_names.
  • json_list.
  • json_each_row.
  • raw.
  • json_as_string.
  • parquet.

Csv_with_namesCsv_with_names

This format is based on CSV format. Data is comma-separated and stored in columns with the first file line containing column names.

Sample data:

Year,Manufacturer,Model,Price
1997,Ford,E350,3000.00
1999,Chevy,"Venture «Extended Edition»",4900.00
Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=csv_with_names,
    SCHEMA
    (
        Year int,
        Manufacturer string,
        Model string,
        Price double
    )
)

Query results:

# Manufacturer Model Price Year
1 Ford E350 3000 1997
2 Chevy Venture «Extended Edition» 4900 1999

Tsv_with_namesTsv_with_names

This format is based on TSV format. Data is tab-separated (the 0x9 code) and stored in columns with the first file line containing column names.

Sample data:

Year    Manufacturer    Model   Price
1997    Ford    E350    3000.00
1999    Chevy   "Venture «Extended Edition»"    4900.00
Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=tsv_with_names,
    SCHEMA
    (
        Year int,
        Manufacturer string,
        Model string,
        Price double
    )
)

Query results:

# Manufacturer Model Price Year
1 Ford E350 3000 1997
2 Chevy Venture «Extended Edition» 4900 1999

Json_listJson_list

This format is based on a JSON representation of data. In this format, each file should contain an object in a correct JSON representation.

Example of correct data (represented as a list of JSON objects):

[
    { "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 },
    { "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
]

Example of INCORRECT data (each line contains a separate object in JSON format, but these objects are not represented as a list):

{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 }
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }

Json_each_rowJson_each_row

This format is based on a JSON representation of data. In this format, each file's individual line must contain an object in a valid JSON representation without combining these objects into a JSON list. This format is used when transferring data via streaming systems, such as Yandex Data Streams.

Example of correct data (each line contains a separate object in JSON format, but these objects are not represented as a list):

{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 },
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=json_each_row,
    SCHEMA
    (
        Year int,
        Manufacturer string,
        Model string,
        Price double
    )
)

Query results:

# Manufacturer Model Price Year
1 Ford E350 3000 1997
2 Chevy Venture «Extended Edition» 4900 1999

RawRaw

This format allows reading raw data as is. The data read this way can be processed using YQL tools by splitting it into rows and columns.

Use this format if the built-in features for parsing source data in Yandex Query are insufficient.

Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=raw,
    SCHEMA
    (
        Data String
    )
)

Query results:

Year,Manufacturer,Model,Price
1997,Ford,E350,3000.00
1999,Chevy,\"Venture «Extended Edition»\",4900.00

Json_as_stringJson_as_string

This format is based on a JSON representation of data. It does not split an input JSON document into fields. Instead, it represents each file line as a single JSON object (or a single string). This format is convenient if a list of fields is not permanent and may change in different messages.

In this format, each file should contain:

  • Object in a valid JSON representation in each file line.
  • List of objects in a valid JSON representation.

Example of correct data (represented as a list of JSON objects):

{ "Year": 1997, "Manufacturer": "Ford", "Model": "E350", "Price": 3000.0 }
{ "Year": 1999, "Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900.00 }
Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=json_as_string,
    SCHEMA
    (
        Data Json
    )
)

Query results:

# Data
1 {"Manufacturer": "Ford", "Model": "E350", "Price": 3000, "Year": 1997}
2 {"Manufacturer": "Chevy", "Model": "Venture «Extended Edition»", "Price": 4900, "Year": 1999}

ParquetParquet

This format allows you to read the contents of a file in Apache Parquet format.

Data compression algorithms supported in Parquet files:

  • No compression.
  • SNAPPY
  • GZIP
  • LZO
  • BROTLI
  • LZ4
  • ZSTD
  • LZ4_RAW
Request example
SELECT
    *
FROM <connection>.<path>
WITH
(
    format=parquet,
    SCHEMA
    (
        Year int,
        Manufacturer string,
        Model string,
        Price double
    )
)

Query results:

# Manufacturer Model Price Year
1 Ford E350 3000 1997
2 Chevy Venture «Extended Edition» 4900 1999

Example of reading dataExample of reading data

Sample query for reading data from Yandex Object Storage.

SELECT
        *
FROM
    connection.`folder/filename.csv`
WITH(
    format='csv_with_names',
    SCHEMA
    (
        Year int,
        Manufacturer String,
        Model String,
        Price Double
    )
);

Where:

Field Description
connection Yandex Object Storage connection name
folder/filename.csv Path to the file in the Yandex Object Storage bucket
SCHEMA Data schema description in the file

Supported compression algorithmsSupported compression algorithms

ReadsReads

Yandex Query supports the following compression algorithms for data reads:

Compression format Name in Query
Gzip gzip
Zstd zstd
LZ4 lz4
Brotli brotli
Bzip2 bzip2
Xz xz

The parquet file format supports its own internal compression algorithms. Yandex Query enables reading data in parquet format using the following compression algorithms:

Compression format Name in Query
Raw raw
Snappy snappy

Writing data to Yandex Object StorageWriting data to Yandex Object Storage

Currently, the following data write formats are supported:

Data format Name in Query
CSV csv_with_names
Parquet parquet

Query supports the following compression algorithms for data writes:

Compression format Name in Query
Gzip gzip
Zstd zstd
LZ4 lz4
Brotli brotli
Bzip2 bzip2
Xz xz

Parquet file format supports its own internal compression algorithms. Query allows writing data in parquet format using the following compression algorithms:

Compression format Name in Query
Snappy No name, by default

Writing data to Yandex Data StreamsWriting data to Yandex Data Streams

Data Streams only lets you write data as a byte stream that is interpreted on the receiving side.

File format and compression algorithm settings for data writes in Data Streams are not applied.

Was the article helpful?

Previous
Connecting using an IDE
Next
Reading data using connections
© 2025 Direct Cursus Technology L.L.C.