Fetching data from RabbitMQ to Managed Service for ClickHouse®
- Required paid resources
- Getting started
- Set up the RabbitMQ integration for the Managed Service for ClickHouse® cluster
- Create a RabbitMQ engine table in your Managed Service for ClickHouse® cluster
- Send the test data to the RabbitMQ queue
- Make sure the test data appears in the Managed Service for ClickHouse® cluster table
- Delete the resources you created
You can supply data from RabbitMQ to a Managed Service for ClickHouse® cluster in real time. Managed Service for ClickHouse® will automatically insert the data routed into particular exchange points of the specified RabbitMQ queues into a RabbitMQ
To set up data delivery from RabbitMQ to Managed Service for ClickHouse®:
- Set up RabbitMQ integration for your Managed Service for ClickHouse® cluster.
- Create a RabbitMQ table in the Managed Service for ClickHouse® cluster.
- Send the test data to the RabbitMQ queue.
- Make sure the test data appears in the Managed Service for ClickHouse® cluster table.
If you no longer need the resources you created, delete them.
Required paid resources
The support cost for this solution includes:
- Managed Service for ClickHouse® cluster fee, which covers the use of computing resources allocated to hosts (including ZooKeeper hosts) and disk space (see Managed Service for ClickHouse® pricing).
- Fee for public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
- VM fee, which covers the use of computing resources, storage, and, optionally, public IP address (see Compute Cloud pricing).
Getting started
Set up your infrastructure
-
Create a Managed Service for ClickHouse® cluster with your preferred configuration and add the database named
db1. Enable public access to the cluster during creation so you can connect to it from your local machine. Connections from within the Yandex Cloud network are enabled by default.Note
Public access to cluster hosts is required if you plan to connect to the cluster via the internet. This connection option is simpler and is recommended for the purposes of this guide. You can connect to non-public hosts as well but only from Yandex Cloud virtual machines located in the same cloud network as the cluster.
Integration with RabbitMQ is available during cluster setup. In this example, however, we will configure the integration at a later stage.
-
Create a virtual machine for RabbitMQ. Enable public access to the VM during creation so you can connect to it from your local machine. Connections from the Yandex Cloud network are enabled by default.
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
Download the clickhouse-cluster-and-vm-for-rabbitmq.tf
configuration file to your current working directory.This file describes:
- Network.
- Subnet.
- Default security group and inbound internet rules for your cluster and VM.
- Managed Service for ClickHouse® cluster.
- Virtual machine.
-
In the
clickhouse-cluster-and-vm-for-rabbitmq.tffile, specify the following:- Username and password that will be used to access the Managed Service for ClickHouse® cluster.
- ID of the public, non-GPU Ubuntu image to use for the VM.
- Username and path to the SSH public key for VM access. By default, the pre-configured image ignores the specified username and automatically creates a user named
ubuntu. Use it to connect to the VM.
-
Validate your Terraform configuration files using this command:
terraform validateTerraform will display any configuration errors detected in your files.
-
Create the required infrastructure:
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Configure additional settings
-
Connect to a virtual machine over SSH.
-
Install RabbitMQ:
sudo apt update && sudo apt install rabbitmq-server --yes -
Create a user for RabbitMQ:
sudo rabbitmqctl add_user <username> <password> -
Grant this user permissions to connect to the server:
sudo rabbitmqctl set_permissions -p / <username> ".*" ".*" ".*" && \ sudo rabbitmqctl set_topic_permissions -p / <username> amq.topic "cars" "cars"
-
-
Install the
amqp-publishandamqp-declare-queuetools for RabbitMQ and jq for processing JSON streams:sudo apt update && sudo apt install amqp-tools --yes && sudo apt-get install jq --yes -
Use
amqp-declare-queueto create a queue namedcarsin RabbitMQ:amqp-declare-queue \ --url=amqp://<username>:<password>@<IP_address_or_FQDN_of_the_RabbitMQ_server>:5672 \ --queue=cars -
Install
clickhouse-clientto connect to the database in your Managed Service for ClickHouse® cluster.-
Add the ClickHouse® DEB repository
:sudo apt update && sudo apt install --yes apt-transport-https ca-certificates dirmngr && \ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4 && \ echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee \ /etc/apt/sources.list.d/clickhouse.list -
Install the dependencies:
sudo apt update && sudo apt install clickhouse-client --yes -
Download the
clickhouse-clientconfiguration file:mkdir -p ~/.clickhouse-client && \ wget "https://storage.yandexcloud.net/doc-files/clickhouse-client.conf.example" \ --output-document ~/.clickhouse-client/config.xml
Verify that you can establish an SSL connection to the Managed Service for ClickHouse® cluster via
clickhouse-client. -
Set up the RabbitMQ integration for the Managed Service for ClickHouse® cluster
In the Managed Service for ClickHouse® cluster settings, navigate to the DBMS settings → Rabbitmq section and specify the username and password for RabbitMQ authentication.
Add the clickhouse.config.rabbitmq block containing RabbitMQ username and password to the cluster configuration:
resource "yandex_mdb_clickhouse_cluster" "clickhouse-cluster" {
...
clickhouse {
...
config {
rabbitmq {
username = "<username>"
password = "<password>"
}
}
...
}
}
Create a RabbitMQ engine table in your Managed Service for ClickHouse® cluster
Suppose, you publish the following JSON car sensor data to the RabbitMQ exchange named exchange, which routes it to the cars queue:
-
device_id: Device string identifier. -
datetime: Date and time of data generation inYYYY-MM-DD HH:MM:SSformat. -
Car coordinates:
latitude: Latitude.longitude: Longitude.altitude: Height above mean sea level.
-
speed: Current speed. -
battery_voltage: Battery voltage for electric cars.nullfor ICE vehicles. -
cabin_temperature: Temperature inside the car. -
fuel_level: Fuel level for ICE cars.nullfor electric cars.
This data will be transmitted as RabbitMQ messages. Each message will contain a string containing a serialized JSON object with the following structure:
{"device_id":"iv9a94th6rzt********","datetime":"2020-06-05 17:27:00","latitude":"55.70329032","longitude":"37.65472196","altitude":"427.5","speed":"0","battery_voltage":"23.5","cabin_temperature":"17","fuel_level":null}
For table inserts, the Managed Service for ClickHouse® cluster will use the JSONEachRow format
In the Managed Service for ClickHouse® cluster, create a table to store data incoming from RabbitMQ:
-
Connect to the
db1database on your Managed Service for ClickHouse® cluster viaclickhouse-client. -
Run this query:
CREATE TABLE IF NOT EXISTS db1.cars ( device_id String, datetime DateTime, latitude Float32, longitude Float32, altitude Float32, speed Float32, battery_voltage Nullable(Float32), cabin_temperature Float32, fuel_level Nullable(Float32) ) ENGINE = RabbitMQ SETTINGS rabbitmq_host_port = '<internal_IP_address_of_VM_with_RabbitMQ>:5672', rabbitmq_routing_key_list = 'cars', rabbitmq_exchange_name = 'exchange', rabbitmq_format = 'JSONEachRow';
This table will be automatically populated with messages consumed from the cars queue, which is bound to RabbitMQ exchange. When reading the data, Managed Service for ClickHouse® uses the authentication credentials provided earlier.
Send the test data to the RabbitMQ queue
-
Create a file named
sample.jsonwith test data:{ "device_id": "iv9a94th6rzt********", "datetime": "2020-06-05 17:27:00", "latitude": 55.70329032, "longitude": 37.65472196, "altitude": 427.5, "speed": 0, "battery_voltage": 23.5, "cabin_temperature": 17, "fuel_level": null } { "device_id": "rhibbh3y08qm********", "datetime": "2020-06-06 09:49:54", "latitude": 55.71294467, "longitude": 37.66542005, "altitude": 429.13, "speed": 55.5, "battery_voltage": null, "cabin_temperature": 18, "fuel_level": 32 } { "device_id": "iv9a94th6rzt********", "datetime": "2020-06-07 15:00:10", "latitude": 55.70985913, "longitude": 37.62141918, "altitude": 417.0, "speed": 15.7, "battery_voltage": 10.3, "cabin_temperature": 17, "fuel_level": null } -
Use
jqandamqp-publishto send data fromsample.jsonto the previously createdcarsqueue viaexchange.jq \ --raw-output \ --compact-output . ./sample.json |\ amqp-publish \ --url=amqp://<RabbitMQ_username>:<password>@<IP_address_or_FQDN_of_the_RabbitMQ_server>:5672 \ --routing-key=cars \ --exchange=exchange
Make sure the test data appears in the Managed Service for ClickHouse® cluster table
To access the data, use a materialized view. Once a materialized view is attached to a RabbitMQ table, it starts gathering data in the background automatically. This enables the system to continuously consume messages from RabbitMQ and convert them to the required format using SELECT.
Note
A message from the queue can be read by ClickHouse® only once. Therefore, instead of reading data directly from the table, use a materialized view.
To create a materialized view for the db1.cars table:
-
Connect to the
db1database on your Managed Service for ClickHouse® cluster viaclickhouse-client. -
Run the following queries:
CREATE TABLE IF NOT EXISTS db1.cars_data_source ( device_id String, datetime DateTime, latitude Float32, longitude Float32, altitude Float32, speed Float32, battery_voltage Nullable(Float32), cabin_temperature Float32, fuel_level Nullable(Float32) ) ENGINE MergeTree() ORDER BY device_id; CREATE MATERIALIZED VIEW db1.cars_view TO db1.cars_data_source AS SELECT * FROM db1.cars;
To retrieve all data from the db1.cars_view materialized view:
-
Connect to the
db1database on your Managed Service for ClickHouse® cluster viaclickhouse-client. -
Run this query:
SELECT * FROM db1.cars_view;
The query results will show all data sent to RabbitMQ.
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Yandex Managed Service for ClickHouse® cluster.
- Delete the virtual machine.
- If you reserved public static IP addresses, release and delete them.
-
In the terminal window, go to the directory containing the infrastructure plan.
Warning
Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.
-
Delete resources:
-
Run this command:
terraform destroy -
Confirm deleting the resources and wait for the operation to complete.
All the resources described in the Terraform manifests will be deleted.
-
ClickHouse® is a registered trademark of ClickHouse, Inc