Ingesting data into storage systems
Data from mobile phones, various smart devices, or external services can come in massive numbers of small batches. The communication circuits used for transmission are often slow, and the communication time, limited. Yandex Data Streams receives data arriving at high frequency and speed and forms outbound data batches for the target systems, thus maintaining optimal operating modes for sources and targets. Using an API gateway to receive messages enables you to implement a custom data transmission protocol.
In this use case, the API gateway receives incoming data and sends it to the data stream. The stream buffers the data and sends it to the ClickHouse® database cluster through a transfer.
To set up data ingestion:
- Get your cloud ready.
- Set up your environment.
- Create a ClickHouse® cluster.
- Create a data stream.
- Create an API gateway.
- Create a transfer.
- Test sending and receiving data.
If you no longer need to ingest data, delete the associated resources.
Get your cloud ready
Sign up in Yandex Cloud and create a billing account:
- Navigate to the management console
and log in to Yandex Cloud or register a new account. - On the Yandex Cloud Billing
page, make sure you have a linked billing account with anACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one and link a cloud to it.
If you have an active billing account, you can navigate to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost of supporting data ingestion into storage systems includes:
- Fee for API gateway requests (see Yandex API Gateway pricing).
- Data stream maintenance fee (see Yandex Data Streams pricing).
- Fee for transferring data between sources and targets (see Yandex Data Transfer pricing).
- Fee for a continuously running Managed Service for ClickHouse® cluster (see Managed Service for ClickHouse® pricing).
Set up your environment
Create a service account and assign it the editor
role for your folder.
Create a ClickHouse® cluster
- In the management console
, select the folder where you want to create a DB cluster. - Select Managed Service for ClickHouse®.
- Click Create cluster.
- Configure your ClickHouse® cluster:
- Under Basic parameters:
- Enter a name for the cluster.
- Select the service account you created earlier.
- Under Database, specify the DВ name, username, and password.
- Under Hosts, click
. Enable Public access and click Save. - Under Additional settings, enable the following options:
- Access from Data Transfer.
- Access from the management console.
- Specify the remaining cluster settings by following this guide.
- Click Create cluster.
Wait for the cluster to start. When the cluster is ready for use, its status will change to Alive
.
Create a data stream
- In the management console
, select the folder where you want to create a data stream. - Select Data Streams.
- Click Create stream.
- Specify an existing serverless database in YDB or create a new one. If you chose to create a new database, click Refresh after creating it to refresh the list of databases.
- Enter a name for the stream.
- Click Create.
Wait for the stream to start. When the stream is ready for use, its status will change from CREATING
to ACTIVE
.
Create an API gateway
-
On the page of the created stream, click Actions and select API Gateway.
-
Name your API gateway.
-
Under Specification, replace the
service_account_id
key value with the ID of the service account you created earlier.Save the values of the Name and Service domain fields, as you will need them later.
-
Click Create.
Wait for the API gateway to start. When the API gateway is ready for use, its status will change from CREATING
to ACTIVE
.
Create a transfer
- In the management console
, select the folder where you want to create a transfer. - Select Yandex Data Transfer.
- Click Create data transfer.
- Name the transfer.
- Create a source endpoint:
- Next to Source, click Create new.
- Name the endpoint.
- In the Database type list, select
Yandex Data Streams
. - Select a database for the source.
- Enter the name of the stream you created earlier.
- Select the service account you created earlier.
- Click Create.
- Create a target endpoint:
- Next to Target, click Create new.
- Name the endpoint.
- In the Database type list, select
ClickHouse
. - Select the MDB cluster you created earlier.
- Enter the DB name, username, and password of the cluster you created earlier.
- Click Create.
- Click Create.
- Click
next to the name of the created transfer and select Activate.
Wait until the transfer gets activated. Once the transfer is ready for use, its status will change from Creating to Replicating.
Test sending and receiving data
-
Send data to the storage system:
curl --request POST --data 'test massage' https://<url>/<paths>
Where:
<url>
: API gateway Service domain you saved earlier.<paths>
: API gateway Name you saved earlier.
-
In the management console
, select the Managed Service for ClickHouse® cluster you created earlier. -
On the left-hand panel, select SQL.
-
Enter the username and password and click Connect.
-
In the list, select select the previously created database.
-
Select a DB table.
If everything is set up properly, the table will show a new entry containing system data and the sent message.
How to delete the resources you created
To stop paying for the resources you used in this scenario:
- Delete the API gateway.
- Delete the transfer.
- Delete the endpoints.
- Delete the data stream.
- Delete the ClickHouse® cluster.
ClickHouse® is a registered trademark of ClickHouse, Inc