Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Managed Service for ClickHouse®
  • Getting started
    • All tutorials
    • Adding data to a database
    • Migrating data to Managed Service for ClickHouse® using ClickHouse®
    • Migrating data to Managed Service for ClickHouse® using Data Transfer
    • Sharding tables
    • Using hybrid storage
    • Fetching data from Managed Service for Apache Kafka®
    • Fetching data from RabbitMQ
    • Exchanging data with Yandex Data Processing
    • Configuring Yandex Cloud DNS for cluster access from other cloud networks
    • Analyzing Yandex Object Storage logs using Yandex DataLens
    • Configuring Managed Service for ClickHouse® for Graphite
    • Saving a data stream from Yandex Data Streams to Managed Service for ClickHouse®
    • Migrating a database from Google BigQuery
    • Delivering data from Managed Service for Apache Kafka® using Yandex Data Transfer
    • Migrating data from Yandex Direct using Yandex Cloud Functions, Yandex Object Storage, and Yandex Data Transfer
    • Loading data from Yandex Object Storage to Managed Service for ClickHouse® using Yandex Data Transfer
    • Migrating a database from Greenplum® to ClickHouse®
    • Migrating a database from MySQL® to ClickHouse® using Yandex Data Transfer
    • Asynchronously replicating data from PostgreSQL to ClickHouse®
    • Loading data from Yandex Managed Service for YDB to Managed Service for ClickHouse® using Yandex Data Transfer
    • Copying data from Managed Service for OpenSearch to Managed Service for ClickHouse® using Yandex Data Transfer
    • Ingesting data into storage systems
    • Using parameters
    • Examples of creating QL charts
    • Web analytics with funnels and cohorts calculated based on Yandex Metrica data
    • AppMetrica: direct connection
    • AppMetrica: data export, post-processing, and visualization
    • Loading data from Yandex Metrica to a ClickHouse® data mart
    • Yandex Tracker: Data export and visualization
    • Retail chain's dashboard based on a ClickHouse® database
    • Analyzing sales and locations of pizzerias based on data from a ClickHouse® database and Marketplace
    • Geocoding with the Yandex Maps API for data visualization in DataLens
    • Importing data from Object Storage, processing, and exporting it to Managed Service for ClickHouse®
    • Working with data using Query
    • Federated data queries using Query
    • Integration with an external Microsoft SQL Server database via ClickHouse® JDBC Bridge
    • Integration with an external Oracle database via ClickHouse® JDBC Bridge
  • Access management
  • Pricing policy
  • Terraform reference
  • Yandex Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Inserting data from a file
  • Inserting data using buffering
  • Asynchronous inserts
  • Using a buffer table for inserts
  • Data insertion with schema specification
  • Example of using a schema for data insertion
  • Getting started
  • Installing dependencies
  • Defining the schemas
  • Preparing scripts
  • Inserting data
  1. Tutorials
  2. Adding data to a database

Adding data to ClickHouse®

Written by
Yandex Cloud
Updated at June 3, 2026
  • Inserting data from a file
  • Inserting data using buffering
    • Asynchronous inserts
    • Using a buffer table for inserts
  • Data insertion with schema specification
  • Example of using a schema for data insertion
    • Getting started
    • Installing dependencies
    • Defining the schemas
    • Preparing scripts
    • Inserting data

To insert data into a table, use an INSERT INTO statement:

INSERT INTO db_name.table_name VALUES (v11, v12, v13), (v21, v22, v23), ...

We recommend sending insert queries no more than once per second. You can use buffering to combine multiple small queries into a single large one.

Learn more about INSERT INTO from this ClickHouse® guide.

Inserting data from a fileInserting data from a file

To insert data into a table from a local file, use the INSERT INTO statement in the following format:

INSERT INTO db_name.table_name FROM INFILE '<full_path_to_file>'
[COMPRESSION '<compression_format>'] FORMAT <data_format>;

The COMPRESSION allows you to transfer compressed files, which is perfect for uploading large amounts of data. This option is supported when working through the clickhouse-client or the HTTP interface. If the compression format is not specified, the system determines it based on the file extension. Supported compression formats: none, gzip, deflate, br, xz, zstd, lz4, and bz2.

For a list of supported data formats, refer to this ClickHouse® guide. To learn how to set up Cap'n Proto and Protobuf data format schemas, see Managing data format schemas.

Inserting data using bufferingInserting data using buffering

When inserting data into ClickHouse®, a portion of the computational resources is consumed by overhead operations. Each INSERT query creates a new data part in ClickHouse® storage. In addition to the data rows, each part contains auxiliary metadata files. To reduce the number of data parts, ClickHouse® merges them in the background. The more merge operations required, the more resources are consumed.

As a result, a thousand individual insert queries will impose more load on the cluster than a single query inserting one thousand rows. Therefore, we recommend inserting data in batches of 1,000 to 100,000 rows.

If data arrives in small batches from different sources, use one of the following buffering mechanisms:

  • Asynchronous inserts (recommended)
  • Buffer tables

Asynchronous insertsAsynchronous inserts

If the Async insert setting is enabled for a user, then all insert queries from that user are first buffered in RAM. Data is flushed from the buffer to the destination table when one of the following conditions is met:

  • The buffer size reaches the Async insert max data size setting.
  • The time since the first INSERT after the last buffer flush exceeds the value of the Async busy timeout setting.

To enable asynchronous data inserts, set Async insert to 1.

Note that asynchronous inserts do not support row deduplication.

For more details about asynchronous data inserts, see this ClickHouse® guide.

Using a buffer table for insertsUsing a buffer table for inserts

A buffer table, created in RAM, accumulates incoming data and flushes it to the target table once preset conditions are met.

To create a buffer table, use the Buffer table engine. It requires the following parameters:

Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)

Where:

  • database: Database name.
  • table: Target table name.
  • num_layers: Number of buffers. A table is physically stored in memory as multiple independent buffers.
  • min_time, max_time, min_rows, max_rows, min_bytes, and max_bytes: Conditions for flushing data from the buffer to the target table. The time is specified in seconds.

Flushing occurs when all minimum conditions are satisfied, or when any one of the maximum limits is reached. If the size of incoming data batch exceeds the max_rows or max_bytes settings, the system writes it directly to the target table, bypassing the buffer.

For details on additional Buffer engine parameters and table limitations, see this ClickHouse® guide.

ExampleExample

  1. Create a regular table named users in the db1 database:

    CREATE TABLE IF NOT EXISTS db1.users (id UInt64, name String)
    ENGINE = MergeTree() ORDER BY id;
    
  2. Create a buffer table named users_buffer linked to the target table users:

    CREATE TABLE db1.users_buffer AS db1.users ENGINE = Buffer(db1, users, 1, 10, 100, 10000, 1000000, 10000000, 100000000);
    
  3. Insert data into the buffer table:

    INSERT INTO db1.users_buffer VALUES (1, 'Vlad'), (2, 'John'), (3, 'Zara');
    
  4. Verify the data appears in the target table within the max_time interval of 100 seconds:

    SELECT * FROM db1.users;
    

Result:

┌─id─┬─name─┐
│  1 │ Vlad │
│  2 │ John │
│  3 │ Zara │
└────┴──────┘

Data insertion with schema specificationData insertion with schema specification

In Managed Service for ClickHouse®, you can INSERT and SELECT data in different formats. Most of these formats are self-descriptive. This means that they already contain a format schema that describes valid data types, their order, and representation in this format. Thus, for example, you can insert data directly from a file.

Note

A format schema describes the format of data input or output, while a data schema describes the structure and layout of ClickHouse® databases and tables storing this data. These concepts are not interchangeable.

Cap'n Proto and Protobuf (including ProtobufSingle) data formats do not contain a format schema: data is presented in binary format without any structural details. Before using data in these formats, e.g., inserting data into a table, add a format schema to your Managed Service for ClickHouse® cluster. Thus you will be able to correctly interpret the number, order, and type of values when processing binary data.

You can add one or multiple format schemas to your Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.

Warning

To use the format schemas you added, insert the data into Managed Service for ClickHouse® via the HTTP interface. In this case, data serialization and deserialization is performed on the server side based on the schemas you added.

For more information about data formats, see this ClickHouse® guide.

Example of using a schema for data insertionExample of using a schema for data insertion

This example was tested in the following environment:

  • Yandex Cloud VM running Ubuntu 20.04 LTS.
  • Bash: 5.0.16.
  • clickhouse-client: 20.10.2.20.
  • capnproto: 0.7.0.
  • protobuf-compiler: 3.6.1.
  • Python: 3.8.5; pip3: 20.0.2.

Suppose you have created a single-host Managed Service for ClickHouse® cluster named chcluster with a database named db1. You need to insert user data into the db1.users table, where each record contains the following information:

  • id: User ID.
  • name: Username.

To insert Cap'n Proto and Protobuf user data into the db1.users table:

  1. Install the dependencies.
  2. Prepare the schemas.
  3. Prepare scripts.
  4. Insert data.

Getting startedGetting started

  1. Examine the input data format in order to prepare the appropriate schema definitions.

    In our demonstration example, we assume the following:

    • The user ID (id) is an unsigned 64-bit integer (Uint64 in Cap'n Proto and ClickHouse®, and uint64 in Protobuf).
    • The username (name) is a string (Text in Cap'n Proto, string in Protobuf, String in ClickHouse®).

    For more information on supported data types, see the relevant Cap'n Proto, Protobuf, and ClickHouse® guides.

  2. Connect to the cluster and create the db1.users table if it does not exist, using the required schema:

    CREATE TABLE IF NOT EXISTS db1.users (id UInt64, name String)
    ENGINE = MergeTree() ORDER BY id;
    

Installing dependenciesInstalling dependencies

sudo apt update && sudo apt install -y python3 python3-pip capnproto protobuf-compiler && \
pip3 install protobuf varint pycapnp

Defining the schemasDefining the schemas

  1. Create a schema definition file:

    Cap'n Proto
    Protobuf

    user.capnp

    @0xbec0f3f99cec4fbf;
    
    struct User {
      id @0 :UInt64;
      name @1 :Text;
    }
    

    For file format details, see this Cap'n Proto guide.

    user.proto

    syntax = "proto3";
    
    message User {
      uint64 id = 1;
      string name = 2;
    };
    

    For file format detials, see this Protobuf guide.

  2. Upload the file to Object Storage and retrieve its public URL.

  3. Add your schema to the chcluster cluster:

    • For the Cap'n Proto schema (user.capnp), use the name schema-capnproto.
    • For the Protobuf schema (user.protobuf), use the name schema-protobuf.

Preparing scriptsPreparing scripts

The following Python scripts generate test user data in required formats and insert it into your cluster table.

Note

Python is used here solely for demonstration. The core logic of generating binary data in required format and inserting it into the table can be implemented in any programming language.

To prepare the scripts:

  1. Generate the Python code from the user.proto Protobuf schema file:

    protoc user.proto --python_out .
    

    This command will generate the user_pb2.py file.

  2. Create files with the following Python code:

    Cap'n Proto
    Protobuf

    capnproto-example.py

    import requests
    import io
    import capnp
    from user_capnp import User
    
    DB_HOST="<ClickHouse®>_host_FQDN"
    DB_NAME="db1"
    DB_USER="<DB_username>"
    DB_PASS="<DB_user_password>"
    CA_CERT="/usr/local/share/ca-certificates/Yandex/RootCA.crt"
    
    SCHEMA_NAME = 'schema-capnproto'
    SCHEMA_TYPE = "CapnProto"
    SCHEMA_CLASS = "User"
    
    def execute_query(query, data=None):
        url = 'https://{host}:8443/'.format(host=DB_HOST)
        params = {
            'database': DB_NAME,
            'query': query.strip()
        }
        auth = {
            'X-ClickHouse-User': DB_USER,
            'X-ClickHouse-Key': DB_PASS
        }
    
        rs = requests.post(url,
                           params=params,
                           headers=auth,
                           data=data,
                           verify=CA_CERT)
    
        rs.raise_for_status()
    
    def add_user(fileobj, user_id, user_name):
        user = User.new_message()
        user.id = user_id
        user.name = user_name
    
        fileobj.write(user.to_bytes())
    
    message = io.BytesIO()
    
    add_user(message, 11, 'John')
    add_user(message, 12, 'Bob')
    add_user(message, 13, 'Jane')
    
    execute_query(
        '''
        INSERT INTO {database}.users SETTINGS format_schema='{name}:{cls}' FORMAT {type}
        '''.format(database=DB_NAME,
                   type=SCHEMA_TYPE,
                   name=SCHEMA_NAME,
                   cls=SCHEMA_CLASS), data=message.getvalue())
    

    The script above does the following:

    1. Imports the User class generated from the user.capnp schema (from user_capnp import User).
    2. Queries the cluster via HTTPS with SSL encryption.
    3. Creates a User class object, populates it with the test dataset, (def add_user ...) and serializes it into a message bitstream for I/O.
    4. Deserializes the message bitstream into the User object using the schema-capnproto schema and inserts the resulting data into the db1.users cluster table.

    protobuf-example.py

    import requests
    import io
    import varint
    from user_pb2 import User
    
    DB_HOST="<ClickHouse®>_host_FQDN"
    DB_NAME="db1"
    DB_USER="<DB_username>"
    DB_PASS="<DB_user_password>"
    CA_CERT="/usr/local/share/ca-certificates/Yandex/RootCA.crt"
    
    SCHEMA_NAME = 'schema-protobuf'
    SCHEMA_TYPE = "Protobuf"
    SCHEMA_CLASS = "User"
    
    def execute_query(query, data=None):
        url = 'https://{host}:8443/'.format(host=DB_HOST)
        params = {
            'database': DB_NAME,
            'query': query.strip()
        }
        auth = {
            'X-ClickHouse-User': DB_USER,
            'X-ClickHouse-Key': DB_PASS
        }
    
        rs = requests.post(url,
                           params=params,
                           headers=auth,
                           data=data,
                           verify=CA_CERT)
    
        rs.raise_for_status()
    
    def add_user(fileobj, user_id, user_name):
        user = User()
        user.id = user_id
        user.name = user_name
    
        fileobj.write(varint.encode(user.ByteSize()))
        fileobj.write(user.SerializeToString())
    
    message = io.BytesIO()
    add_user(message, 21, 'Stephen')
    add_user(message, 22, 'Olivia')
    add_user(message, 23, 'Tim')
    
    execute_query(
        '''INSERT INTO {database}.users SETTINGS format_schema='{name}:{cls}' FORMAT {type}
        '''.format(database=DB_NAME,
                   type=SCHEMA_TYPE,
                   name=SCHEMA_NAME,
                   cls=SCHEMA_CLASS), data=message.getvalue())
    

    The script above does the following:

    1. Imports the User class from the user_pb2.py module generated by compiling the relevant proto file (from user_pb2 import User).
    2. Creates a User class object, populates it with the test dataset, (def add_user ...) and serializes it into a message bitstream for I/O.
    3. Deserializes the message bitstream into the User object using the schema-protobuf schema and inserts the resulting data into the db1.users cluster table.

    To learn how to get a host’s FQDN, see this guide.

Inserting dataInserting data

  1. Run the scripts you prepared at the previous step:

    Cap'n Proto
    Protobuf
    python3 capnproto-example.py
    
    python3 protobuf-example.py
    
  2. Connect to the cluster and verify the data was inserted by running the SELECT query:

    Cap'n Proto
    Protobuf
    SELECT * FROM db1.users;
    
    ┌─id─┬─name─┐
    │ 11 │ John │
    │ 12 │ Bob  │
    │ 13 │ Jane │
    └────┴──────┘
    
    SELECT * FROM db1.users;
    
    ┌─id─┬─name────┐
    │ 21 │ Stephen │
    │ 22 │ Olivia  │
    │ 23 │ Tim     │
    └────┴─────────┘
    

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
All tutorials
Next
Migrating data to Managed Service for ClickHouse® using ClickHouse®
© 2026 Direct Cursus Technology L.L.C.