Terms and definitions in Data Streams
Data stream
A data stream is a named set of messages. Messages in Data Streams are written and read via streams. Data streams are created based on Yandex Managed Service for YDB and are stored in databases.
Shard
To enable horizontal scaling, a stream is divided into shards which are units of concurrency. Each shard has a limited throughput.
Note
As for now, you can only reduce the number of shards in a stream by deleting and recreating a stream with a smaller number of shards.
You can configure a data stream to increase the number of shards as the write speed into the stream increases. For more information, see autopartitioning in the YDB documentation
Shard key
A shard key is specified for each message while writing it to a stream. Using the key hash, the message is mapped to a certain shard inside the stream.
Warning
When updating the number of stream shards, their distribution across the key hash space changes, too. Messages that were written before the number of shards was updated remain in the same shards and the same sequence. New messages are distributed over a new number of shards.
Shard throughput
Each shard has a limited user-defined throughput. The maximum data write speed per shard is 1 MB/sec and the maximum data read speed is 2 MB/sec.
Message
A message is the minimum atomic unit of user information.
It consists of a body and additional system properties.
Message body
The body of a message is a set of bytes. Data Streams does not interpret the message body in any way.
Message sequence number
When writing data to a stream, each message is assigned a sequence number. Message sequence numbers are unique within a single shard and increase sequentially.
Message retention period
The message retention period is set for each stream. After it expires, messages are automatically deleted.
Consumers
Consumers are applications that get data from Data Streams and process it. All consumers share the total quota for data reads.
Consumer groups
In some cases, the common quota