Adding data to ClickHouse®

Written by

Updated at July 6, 2026

Inserting data from a file
Inserting data using buffering
- Asynchronous inserts
- Using a buffer table for inserts
Data insertion with schema specification
Example of using a schema for data insertion

To insert data into a table, use an INSERT INTO statement:

INSERT INTO db_name.table_name VALUES (v11, v12, v13), (v21, v22, v23), ...

We recommend sending insert queries no more than once per second. You can use buffering to combine multiple small queries into a single large one.

Learn more about the INSERT INTO request in this ClickHouse® guide.

Inserting data from a file

To insert data into a table from a local file, use the INSERT INTO statement in the following format:

INSERT INTO db_name.table_name FROM INFILE '<full_path_to_file>'
[COMPRESSION '<compression_format>'] FORMAT <data_format>;

The COMPRESSION allows you to transfer compressed files, which is perfect for uploading large amounts of data. This option is supported when working through the clickhouse-client or the HTTP interface. If the compression format is not specified, the system determines it based on the file extension. Supported compression formats: none, gzip, deflate, br, xz, zstd, lz4, and bz2.

For a list of supported data formats, refer to this ClickHouse® guide. To learn how to set up Cap'n Proto and Protobuf data format schemas, see Managing data format schemas.

Inserting data using buffering

When inserting data into ClickHouse®, a portion of the computational resources is consumed by overhead operations. Each INSERT query creates a new data part in ClickHouse® storage. In addition to the data rows, each part contains auxiliary metadata files. To reduce the number of data parts, ClickHouse® merges them in the background. The more merge operations required, the more resources are consumed.

As a result, a thousand individual insert queries will impose more load on the cluster than a single query inserting one thousand rows. Therefore, we recommend inserting data in batches of 1,000 to 100,000 rows.

If data arrives in small batches from different sources, use one of the following buffering mechanisms:

Asynchronous inserts (recommended)
Buffer tables

Asynchronous inserts

If the Async insert setting is enabled for a user, then all insert queries from that user are first buffered in RAM. Data is flushed from the buffer to the destination table when one of the following conditions is met:

The buffer size reaches the Async insert max data size setting.
The time since the first INSERT after the last buffer flush exceeds the value of the Async busy timeout setting.

To enable asynchronous data inserts, set Async insert to 1.

Note that asynchronous inserts do not support row deduplication.

For more details about asynchronous data inserts, see this ClickHouse® guide.

Using a buffer table for inserts

A buffer table, created in RAM, accumulates incoming data and flushes it to the target table once preset conditions are met.

To create a buffer table, use the Buffer table engine. It requires the following parameters:

Buffer(database, table, num_layers, min_time, max_time, min_rows, max_rows, min_bytes, max_bytes)

Where:

database: Database name.
table: Target table name.
num_layers: Number of buffers. A table is physically stored in memory as multiple independent buffers.
min_time, max_time, min_rows, max_rows, min_bytes, and max_bytes: Conditions for flushing data from the buffer to the target table. The time is specified in seconds.

Flushing occurs when all minimum conditions are satisfied, or when any one of the maximum limits is reached. If the size of incoming data batch exceeds the max_rows or max_bytes settings, the system writes it directly to the target table, bypassing the buffer.

For details on additional Buffer engine parameters and table limitations, see this ClickHouse® guide.

Example

Create a regular table named users in the db1 database:

CREATE TABLE IF NOT EXISTS db1.users (id UInt64, name String)
ENGINE = MergeTree() ORDER BY id;

Create a buffer table named users_buffer linked to the target table users:

CREATE TABLE db1.users_buffer AS db1.users ENGINE = Buffer(db1, users, 1, 10, 100, 10000, 1000000, 10000000, 100000000);

Insert data into the buffer table:

INSERT INTO db1.users_buffer VALUES (1, 'Vlad'), (2, 'John'), (3, 'Zara');

Verify the data appears in the target table within the max_time interval of 100 seconds:
```
SELECT * FROM db1.users;
```

Result:

┌─id─┬─name─┐
│  1 │ Vlad │
│  2 │ John │
│  3 │ Zara │
└────┴──────┘

Data insertion with schema specification

In Managed Service for ClickHouse®, you can INSERT and SELECT data in different formats. Most of these formats are self-descriptive. This means that they already contain a format schema that describes valid data types, their order, and representation in this format. Thus, for example, you can insert data directly from a file.

Note

A format schema describes the format of data input or output, while a data schema describes the structure and layout of ClickHouse® databases and tables storing this data. These concepts are not interchangeable.

Cap'n Proto and Protobuf (including ProtobufSingle) data formats do not contain a format schema: data is presented in binary format without any structural details. Before using data in these formats, e.g., inserting data into a table, add a format schema to your Managed Service for ClickHouse® cluster. Thus you will be able to correctly interpret the number, order, and type of values when processing binary data.

You can add one or multiple format schemas to your Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.

Warning

To use the format schemas you added, insert the data into Managed Service for ClickHouse® via the HTTP interface. In this case, data serialization and deserialization is performed on the server side based on the schemas you added.

Learn more about data formats in this ClickHouse® guide.

Example of using a schema for data insertion

This example was tested in the following environment:

Yandex Cloud VM running Ubuntu 20.04 LTS.
Bash: 5.0.16.
clickhouse-client: 20.10.2.20.
capnproto: 0.7.0.
protobuf-compiler: 3.6.1.
Python: 3.8.5; pip3: 20.0.2.

Suppose you have created a single-host Managed Service for ClickHouse® cluster named chcluster with a database named db1. You need to insert user data into the db1.users table, where each record contains the following information:

id: User ID.
name: Username.

To insert Cap'n Proto and Protobuf user data into the db1.users table:

Getting started

Examine the input data format in order to prepare the appropriate schema definitions.

In our demonstration example, we assume the following:
- The user ID (id) is an unsigned 64-bit integer (Uint64 in Cap'n Proto and ClickHouse®, and uint64 in Protobuf).
- The username (name) is a string (Text in Cap'n Proto, string in Protobuf, String in ClickHouse®).
For more information on supported data types, see the relevant Cap'n Proto, Protobuf, and ClickHouse® guides.

Connect to the cluster and create the db1.users table if it does not exist, using the required schema:

CREATE TABLE IF NOT EXISTS db1.users (id UInt64, name String)
ENGINE = MergeTree() ORDER BY id;

Installing dependencies

sudo apt update && sudo apt install -y python3 python3-pip capnproto protobuf-compiler && \
pip3 install protobuf varint pycapnp

Defining the schemas

Create a schema definition file:
Cap'n Proto

Protobuf
user.capnp
```
@0xbec0f3f99cec4fbf;

struct User {
  id @0 :UInt64;
  name @1 :Text;
}
```
For file format details, see this Cap'n Proto guide.
user.proto
```
syntax = "proto3";

message User {
  uint64 id = 1;
  string name = 2;
};
```
For file format details, see this Protobuf guide.
Upload the file to Object Storage and retrieve its public URL.
Add your schema to the chcluster cluster:
- For the Cap'n Proto schema (user.capnp), use the name schema-capnproto.
- For the Protobuf schema (user.protobuf), use the name schema-protobuf.

Preparing scripts

The following Python scripts generate test user data in required formats and insert it into your cluster table.

Note

Python is used here solely for demonstration. The core logic of generating binary data in required format and inserting it into the table can be implemented in any programming language.

To prepare the scripts:

Generate the Python code from the user.proto Protobuf schema file:
```
protoc user.proto --python_out .
```
This command will generate the user_pb2.py file.

Create files with the following Python code:

Cap'n Proto

Protobuf

capnproto-example.py

import requests
import io
import capnp
from user_capnp import User

DB_HOST="<ClickHouse®>_host_FQDN"
DB_NAME="db1"
DB_USER="<DB_username>"
DB_PASS="<DB_user_password>"
CA_CERT="/usr/local/share/ca-certificates/Yandex/RootCA.crt"

SCHEMA_NAME = 'schema-capnproto'
SCHEMA_TYPE = "CapnProto"
SCHEMA_CLASS = "User"

def execute_query(query, data=None):
    url = 'https://{host}:8443/'.format(host=DB_HOST)
    params = {
        'database': DB_NAME,
        'query': query.strip()
    }
    auth = {
        'X-ClickHouse-User': DB_USER,
        'X-ClickHouse-Key': DB_PASS
    }

    rs = requests.post(url,
                       params=params,
                       headers=auth,
                       data=data,
                       verify=CA_CERT)

    rs.raise_for_status()

def add_user(fileobj, user_id, user_name):
    user = User.new_message()
    user.id = user_id
    user.name = user_name

    fileobj.write(user.to_bytes())

message = io.BytesIO()

add_user(message, 11, 'John')
add_user(message, 12, 'Bob')
add_user(message, 13, 'Jane')

execute_query(
    '''
    INSERT INTO {database}.users SETTINGS format_schema='{name}:{cls}' FORMAT {type}
    '''.format(database=DB_NAME,
               type=SCHEMA_TYPE,
               name=SCHEMA_NAME,
               cls=SCHEMA_CLASS), data=message.getvalue())

The script above does the following:

Imports the User class generated from the user.capnp schema (from user_capnp import User).
Queries the cluster via HTTPS with SSL encryption.
Creates a User class object, populates it with the test dataset, (def add_user ...) and serializes it into a message bitstream for I/O.
Deserializes the message bitstream into the User object using the schema-capnproto schema and inserts the resulting data into the db1.users cluster table.

protobuf-example.py

import requests
import io
import varint
from user_pb2 import User

DB_HOST="<ClickHouse®>_host_FQDN"
DB_NAME="db1"
DB_USER="<DB_username>"
DB_PASS="<DB_user_password>"
CA_CERT="/usr/local/share/ca-certificates/Yandex/RootCA.crt"

SCHEMA_NAME = 'schema-protobuf'
SCHEMA_TYPE = "Protobuf"
SCHEMA_CLASS = "User"

def execute_query(query, data=None):
    url = 'https://{host}:8443/'.format(host=DB_HOST)
    params = {
        'database': DB_NAME,
        'query': query.strip()
    }
    auth = {
        'X-ClickHouse-User': DB_USER,
        'X-ClickHouse-Key': DB_PASS
    }

    rs = requests.post(url,
                       params=params,
                       headers=auth,
                       data=data,
                       verify=CA_CERT)

    rs.raise_for_status()

def add_user(fileobj, user_id, user_name):
    user = User()
    user.id = user_id
    user.name = user_name

    fileobj.write(varint.encode(user.ByteSize()))
    fileobj.write(user.SerializeToString())

message = io.BytesIO()
add_user(message, 21, 'Stephen')
add_user(message, 22, 'Olivia')
add_user(message, 23, 'Tim')

execute_query(
    '''INSERT INTO {database}.users SETTINGS format_schema='{name}:{cls}' FORMAT {type}
    '''.format(database=DB_NAME,
               type=SCHEMA_TYPE,
               name=SCHEMA_NAME,
               cls=SCHEMA_CLASS), data=message.getvalue())

The script above does the following:

Imports the User class from the user_pb2.py module generated by compiling the relevant proto file (from user_pb2 import User).
Creates a User class object, populates it with the test dataset, (def add_user ...) and serializes it into a message bitstream for I/O.
Deserializes the message bitstream into the User object using the schema-protobuf schema and inserts the resulting data into the db1.users cluster table.

To learn how to get a host’s FQDN, see this guide.

Inserting data

Run the scripts you prepared at the previous step:
Cap'n Proto

Protobuf
```
python3 capnproto-example.py
```
```
python3 protobuf-example.py
```

Connect to the cluster and verify the data was inserted by running the SELECT query:

Cap'n Proto

Protobuf

SELECT * FROM db1.users;

┌─id─┬─name─┐
│ 11 │ John │
│ 12 │ Bob  │
│ 13 │ Jane │
└────┴──────┘

SELECT * FROM db1.users;

┌─id─┬─name────┐
│ 21 │ Stephen │
│ 22 │ Olivia  │
│ 23 │ Tim     │
└────┴─────────┘

ClickHouse® is a registered trademark of ClickHouse, Inc.

Adding data to ClickHouse®

Inserting data from a fileInserting data from a file

Inserting data using bufferingInserting data using buffering

Asynchronous insertsAsynchronous inserts

Using a buffer table for insertsUsing a buffer table for inserts

ExampleExample

Data insertion with schema specificationData insertion with schema specification

Example of using a schema for data insertionExample of using a schema for data insertion

Getting startedGetting started

Installing dependenciesInstalling dependencies

Defining the schemasDefining the schemas

Preparing scriptsPreparing scripts

Inserting dataInserting data

Was the article helpful?