Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Streams
    • All tutorials
    • Entering data into storage systems
    • Smart log processing
    • Transferring data within microservice architectures
    • Saving data to ClickHouse®
    • Replicating logs to Object Storage using Fluent Bit
    • Replicating logs to Object Storage using Data Streams
    • Migrating data to Yandex Object Storage using Yandex Data Transfer
    • Delivering data from Yandex Managed Service for Apache Kafka® using Yandex Data Transfer
    • Delivering data from an Data Streams queue to Managed Service for YDB
    • Delivering data to Yandex Managed Service for Apache Kafka® using Yandex Data Transfer
    • YDB change data capture and delivery to YDS
    • PostgreSQL change data capture and delivery to YDS
    • MySQL® change data capture and delivery to YDS
    • Streaming Yandex Cloud Postbox events to Yandex Data Streams and analyzing them using Yandex DataLens
    • Creating an interactive serverless application using WebSocket
    • Processing Audit Trails events
    • Processing CDC Debezium streams
    • Exporting audit logs to MaxPatrol SIEM
    • Searching for Yandex Cloud events in Yandex Query
  • Access management
  • Pricing policy
  • FAQ

In this article:

  • Getting started
  • Create a data stream in Data Streams
  • Set the stream connection parameters
  • Set up Debezium Server
  • Connect Query to your data stream
  • Query the data
  • See also
  1. Tutorials
  2. Processing CDC Debezium streams

Processing CDC Debezium streams

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
  • Create a data stream in Data Streams
  • Set the stream connection parameters
  • Set up Debezium Server
  • Connect Query to your data stream
  • Query the data
  • See also

Debezium is a change data capture service for streaming DB changes to other systems for processing. You can use Yandex Data Streams to capture these changes and Yandex Query to process them. You can do the following with processed data:

  • Send it to Yandex Monitoring to make charts and use it in alerting.
  • Write it to a stream in Data Streams and then send it to Yandex Cloud Functions for processing.
  • Write it to a stream in Data Streams and then transfer it to Yandex Data Transfer to be sent to various storage systems.

debezium-architecture

In this use case, you will send PostgreSQL database changes to a stream in Data Streams using Debezium and then run a query to them with Query. The query will return the number of changes in DB tables grouped by 10s interval. It is assumed that Debezium is installed on the server with PostgreSQL set up and running.

To implement this use case:

  1. Create a data stream in Data Streams.
  2. Set the stream connection credentials.
  3. Set up Debezium Server.
  4. Connect Query to your data stream.
  5. Query the data.

Getting startedGetting started

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Create a data stream in Data StreamsCreate a data stream in Data Streams

Create a data stream named debezium.

Set the stream connection parametersSet the stream connection parameters

  1. Create a service account and assign it the editor role for your folder.
  2. Create a static access key.
  3. On the server where PostgreSQL is set up and running, configure the AWS CLI:
    1. Install the AWS CLI and run this command:

      aws configure
      
    2. Enter the following one by one:

      • AWS Access Key ID [None]:: Service account key ID.
      • AWS Secret Access Key [None]:: Service account secret key.
      • Default region name [None]:: ru-central1 availability zone.

Set up Debezium ServerSet up Debezium Server

On the server where PostgreSQL is set up and running:

  1. Install the Debezium Server by following this guide.

  2. Go to the conf folder and create the file application.properties with the following content:

    debezium.sink.type=kinesis
    debezium.sink.kinesis.region=ru-central1
    debezium.sink.kinesis.endpoint=<endpoint>
    debezium.source.connector.class=io.debezium.connector.postgresql.PostgresConnector
    debezium.source.offset.storage.file.filename=data/offsets.dat
    debezium.source.offset.flush.interval.ms=0
    debezium.source.database.hostname=localhost
    debezium.source.database.port=5432
    debezium.source.database.user=<username>
    debezium.source.database.password=<user_password>
    debezium.source.database.dbname=<DB_name>
    debezium.source.database.server.name=debezium
    debezium.source.plugin.name=pgoutput
    
    debezium.source.transforms=Reroute
    debezium.source.transforms.Reroute.type=io.debezium.transforms.ByLogicalTableRouter
    debezium.source.transforms.Reroute.topic.regex=(.*)
    debezium.source.transforms.Reroute.topic.replacement=<data_stream>
    

    Where:

    • <endpoint>: Endpoint for a Data Streams data stream, e.g., https://yds.serverless.yandexcloud.net/ru-central1/b1g89ae43m6he********/etn01eg4rn1********. You can find the endpoint on the stream page (see Viewing a list of streams).
    • <data_stream>: Data Streams data stream name.
    • <DB_name>: PostgreSQL database name.
    • <username>: Username for connecting to the PostgreSQL database.
    • <user_password>: User password for connecting to the PostgreSQL database.
  3. Run Debezium using this command:

    JAVA_OPTS=-Daws.cborEnabled=false ./run.sh
    
  4. Make some changes to the PostgreSQL database, such as insert data into a table.

  5. If the settings are correct, the Debezium console will show messages like:

    2022-02-11 07:31:12,850 INFO  [io.deb.con.com.BaseSourceTask](pool-7-thread-1) 1 records sent during previous 00:19:59.999, last recorded offset: {transaction_id=null, lsn_proc=23576408, lsn_commit=23576120, lsn=23576408, txId=580, ts_usec=1644564672582666}
    

Connect Query to your data streamConnect Query to your data stream

  1. Create a connection named yds-connection of the Data Streams type.
  2. On the binding creation page:
    • Enter a name for the binding: debezium.
    • Specify the data stream: cdebezium.
    • Add a column titled data with JSON for type.
  3. Click Create.

Query the dataQuery the data

Open the query editor in the Query interface and run the query:

$debezium_data = 
SELECT 
    JSON_VALUE(data,"$.payload.source.table") AS table_name, 
    DateTime::FromMilliseconds(cast(JSON_VALUE(data,"$.payload.source.ts_ms") AS Uint64)) AS `timestamp`
FROM bindings.`debezium`;

SELECT 
    table_name, 
    HOP_END() 
FROM 
    $debezium_data 
GROUP BY 
    HOP(`timestamp`, "PT10S", "PT10S", "PT10S"),
    table_name
LIMIT 2;

Note

Data from a stream source is transferred as an infinite stream. To stop data processing and output the result to the console, the data in the example is limited with the LIMIT operator that sets the number of rows in the result.

See alsoSee also

  • Reading data from Data Streams using Query connections

Was the article helpful?

Previous
Configuring Yandex Query
Next
Exporting audit logs to MaxPatrol SIEM
© 2025 Direct Cursus Technology L.L.C.