Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML Services
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Streams
    • All tutorials
    • Data ingestion into storage systems
    • Smart log processing
    • Data transfer in microservice architectures
    • Storing data in ClickHouse®
    • Log replication to Object Storage via Fluent Bit
    • Log replication to Object Storage via Data Streams
    • Data migration to Yandex Object Storage using Yandex Data Transfer
    • Data delivery from Yandex Managed Service for Apache Kafka® using Yandex Data Transfer
    • Data delivery from an Data Streams queue to Managed Service for YDB
    • Data delivery to Yandex Managed Service for Apache Kafka® using Yandex Data Transfer
    • Change data capture (CDC) from YDB and delivery to YDS
    • Change data capture (CDC) from PostgreSQL and delivery to YDS
    • Change data capture (CDC) from MySQL® and delivery to YDS
    • Streaming Yandex Cloud Postbox events to Yandex Data Streams and analyzing them with Yandex DataLens
    • Building an interactive serverless application using WebSocket
    • Processing Audit Trails events
    • Debezium Change Data Capture (CDC) stream processing
    • Importing audit logs to MaxPatrol SIEM
    • Searching for Yandex Cloud events in Yandex Query
  • Access management
  • Pricing policy
  • FAQ

In this article:

  • Benefits
  • Receiving data
  • Reliability
  • Batching
  • Rewinding data
  • Multiple storage systems
  • Masking data and processing logs
  • Reading data
  • Setup
  1. Tutorials
  2. Data ingestion into storage systems

Ingesting data into storage systems

Written by
Yandex Cloud
Updated at August 15, 2025
  • Benefits
    • Receiving data
    • Reliability
    • Batching
    • Rewinding data
    • Multiple storage systems
    • Masking data and processing logs
    • Reading data
  • Setup

Mobile phones, various smart devices, and external services are increasingly replacing application components as data sources.

Such sources supply data in massive numbers of small batches. The communication circuits used for transmission are often slow, and the communication time limited. Under such conditions, you want to quickly save the data you receive. Its processing can wait till later. That is why the system first sends the data to data streaming buses and then pulls it from there for processing.

Acting like a streaming bus, Yandex Data Streams provides optimal operating conditions for both sources and targets:

  • Receives high-frequency and high-speed incoming data without blocking the sources.
  • Saves the received data in its own storage.
  • Groups data into batches and sends them to the target systems, thus reducing their load.

BenefitsBenefits

When working with external devices or services, you want to quickly save the data you receive. You can fetch the saved data from Data Streams through direct reads or by setting up data delivery to Yandex Cloud storage systems using Yandex Data Transfer.

Receiving dataReceiving data

Data Streams receives data over HTTP. Using Yandex API Gateway, you can implement any data ingestion protocol. After ingestion in API Gateway, the data is ready to move to Data Streams.

Data Streams is highly scalable and can accept data from thousands of data sources at the same time.

ReliabilityReliability

A data streaming bus is a critical infrastructure component that is tolerant to all kinds of Yandex Cloud failures. Data Streams stores ingested data across at least three Yandex Cloud availability zones.

BatchingBatching

Data storage and processing systems perform best when data is written in batches. Data batching is most effective at a single point where all your data flows together. Data buses typically serve as that single point.

Rewinding dataRewinding data

Unlike message queues, data buses store data until the retention period expires without deleting the data after it is read. This allows you to move across the stored data in any direction: from the oldest to the most recent. For example, if a new data format appears and gets written to the target system incorrectly, you can rewind the data stored in the bus to the beginning and then reread and rewrite it to the target system correctly.

Multiple storage systemsMultiple storage systems

The same data is often stored across multiple storage systems at once: ClickHouse® manages rapid analytics, while Object Storage, long-term storage. With data buses, you can easily handle this: as different apps can read data concurrently, you can set up sending the same data to both storage systems, i.e., ClickHouse® and Object Storage. This solution will also enable you to add a third storage system, such as Greenplum® or Elasticsearch, at any time.

The multiple storage system approach is very convenient for ensuring compliance with FZ-152, PCI DSS, and other standards that require data retention for at least one year. In which case, while the last month's data goes to a quick access storage system, the rest of the data may be sent to a long-term "cold" storage in Object Storage.

Masking data and processing logsMasking data and processing logs

Data access is limited across employees. For example, certain data may include personal user information that requires restricted access.

You can send the data to Cloud Functions for masking or any additional processing as needed.

Once processed, the data can be sent to multiple target systems at once: access to the data containing masked personal data can be granted to all employees, while access to the full data, to administrators only.

Reading dataReading data

Data Streams supports automatic processing of stored data through code. Data Streams is compatible with the Amazon Kinesis Data Streams API, allowing you to use SDKs for different programming languages: C++, Java, Go, Python, and more.

SetupSetup

To set up data ingestion into storage systems:

  1. Create a data stream in Data Streams.

  2. Configure the AWS SDK.

  3. Configure Yandex Data Transfer to transfer data to the selected storage system.

    For an example of setting up data delivery from Data Streams, see the tutorial on how to save data to ClickHouse®.

  4. Connect any data processing function to Yandex Data Transfer. This GitHub example illustrates the function code. Alternatively, you can use the SDK to read data directly from Data Streams:

    • Go
    • C++
    • Java
    • JavaScript
    • Python
    • HTTP Kinesis Data Streams API

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
All tutorials
Next
Smart log processing
© 2025 Direct Cursus Technology L.L.C.