Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Data Transfer
  • Available transfers
  • Getting started
    • Resource relationships
    • Transfer types and lifecycles
    • What objects can be transferred
    • Regular incremental copy
    • Parallel copy
    • Data transformation
    • Serialization
    • Yandex Data Transfer specifics for sources and targets
    • Delivery guarantees
    • Operations on transfers
    • Networking in Yandex Data Transfer
    • Speed for copying data in Yandex Data Transfer
    • Change data capture
    • What tasks the service is used for
    • Quotas and limits
  • Troubleshooting
  • Access management
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials

In this article:

  • Serialization on delivery to Object Storage
  • Yandex Data Streams
  • Managed Service for PostgreSQL
  • Serialization at data delivery to message queues
  • Auto
  • Debezium
  1. Concepts
  2. Serialization

Serialization

Written by
Yandex Cloud
Updated at April 9, 2025
  • Serialization on delivery to Object Storage
    • Yandex Data Streams
    • Managed Service for PostgreSQL
  • Serialization at data delivery to message queues
    • Auto
    • Debezium

Serialization is the conversion of data objects to a bit sequence when transferring data to targets that work with raw data. These targets include:

  • Object Storage
  • Apache Kafka®, and Yandex Data Streams message queues

You can set up serialization when creating or updating a target endpoint.

Serialization on delivery to Object StorageSerialization on delivery to Object Storage

When delivering to Object Storage, you can select Serialization format: JSON, CSV, PARQUET, or Raw data. For JSON, the Convert complex data to strings setting is available.

The output data format depends both on the Serialization format setting selection and the type and settings of source endpoint conversion rules.

See below how output data differs if no conversion rules are set for the source endpoint.

Note

There are no examples for PARQUET output data, since this format is binary.

Yandex Data StreamsYandex Data Streams

Input data: Two messages:

Text string
{"device_id":"iv9,"speed":"5"}

Output data:

JSON
CSV
Raw data
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,Text string
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,"{""device_id"":""iv9"",""speed"":5}"
{"data":"Text string","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
{"data":"{\"device_id\":\"iv9\",\"speed\":5}","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
Text string
{"device_id":"iv9,"speed":"5"}

Managed Service for PostgreSQLManaged Service for PostgreSQL

Input data: Table:

device_id speed
iv9 5
rhi 10

Output data:

JSON
CSV
Raw data
{"device_id":"iv9","speed":5}
{"device_id":"rhi","speed":10}
iv9,5,
rhi,10,

This is not supported.

Serialization at data delivery to message queuesSerialization at data delivery to message queues

When delivering data to a message queue, you can use two types of serialization:

  • Auto
  • Debezium

AutoAuto

Automatic selection of serialization settings depending on the source type.

DebeziumDebezium

Debezium serialization with configurable parameters:

  • dt.add.original.type.info: Determines whether to add information about the original types of data to restore the type after the transfer.

    Exception: PostgreSQL with time zone date and time data types. Time zone information cannot be restored.

    The default value is false.

  • dt.batching.max.size: Maximum message size in bytes.

    The default value is 0 bytes (batching is disabled). The recommended value is 1048576 bytes (1 MB). A non-empty value enables batching.

    The setting is relevant if you use JSON serialization with Schema Registry (see the key.converter and value.converter parameters). If using Schema Registry, the enqueued messages may become very small. In such a case, batching increases the throughput of queues.

    When batching is enabled, a single queue message will comprise a sequence of logical messages in Confluent wire format. Data serialized in this way can be easily decoded.

    When batching is enabled, messages are aggregated in a buffer. If a new message leads to exceeding the dt.batching.max.size value, the current buffer is retained and the new message is added to an empty buffer. If one logical message from the source exceeds the dt.batching.max.size value, a batch consisting of this single message will be created. Batching takes place before compression in the queue client.

    Enabling batching can be useful to optimize heavy delivery to a queue where messages are read by a transfer.

    Warning

    Only a transfer can decode batched messages.

  • dt.mysql.timezone: Time zone for MySQL® date and time data types in IANA format.

    The default value is UTC.

  • dt.unknown.types.policy: Policy that determines the behavior for handling user-defined data types.

    The possible values are:

    • skip: Do not abort the transfer and ignore user-defined data types.
    • to_string: Do not abort the transfer and convert user-defined data types to text.
    • fail: Abort the transfer and return an error.

    The default value is skip.

  • decimal.handling.mode: Mode for handling real numbers.

    The possible values are:

    • precise: Precise conversion using the java.math.BigDecimal method.
    • double: Conversion to a double data type. This may result in precision loss.
    • string: Conversion to text.

    The default value is precise.

  • interval.handling.mode: Mode for handling time intervals.

    The possible values are:

    • numeric: Approximate conversion to microseconds.
    • string: Precise conversion based on the string template: P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S.

    The default value is numeric.

  • key.converter and value.converter: Key and value converters.

    The possible values are:

    • org.apache.kafka.connect.json.JsonConverter: JSON, standard for Debezium.
    • io.confluent.connect.json.JsonSchemaConverter: Confluent Schema Registry.

    The default value is org.apache.kafka.connect.json.JsonConverter.

  • key.converter.schemas.enable and value.converter.schemas.enable: Whether to add a schema description to each message for keys and values when using org.apache.kafka.connect.json.JsonConverter.

    The default value is true.

  • key.converter.schema.registry.url and value.converter.schema.registry.url: Whether to add a schema description to each message for keys and values when using io.confluent.connect.json.JsonSchemaConverter.

    The possible values are:

    • Empty string (default): Do not add a schema description.
    • URL string value defining the path to the schema registry service.
  • key.converter.dt.json.generate.closed.content.schema and value.converter.dt.json.generate.closed.content.schema: Determine whether to use the closed content model to generate the data producer schema for the key and value. This enables performing compatibility checks by converting the consumer open model to the closed one and searching for a similar schema among those registered for the producer.

    The default value is false.

    To maintain full transitive compatibility when adding or removing optional fields in the key schema:

    1. Select the Optional-friendly compatibility check policy in the Schema Registry namespace.
    2. In serialization settings of the Managed Service for Apache Kafka® target endpoint, set key.converter.dt.json.generate.closed.content.schema to true.

    To maintain full transitive compatibility when adding or removing optional fields in the value schema:

    1. Select the Optional-friendly compatibility check policy in the Schema Registry namespace.
    2. In serialization settings of the target endpoint, set value.converter.dt.json.generate.closed.content.schema to true.
  • key.converter.basic.auth.user.info and value.converter.basic.auth.user.info: Username and password for authorization in Confluent Schema Registry for keys and values when using io.confluent.connect.json.JsonSchemaConverter.

    Value format: <username>:<password>.

  • key.converter.ssl.ca and value.converter.ssl.ca: Contents of Confluent Schema Registry's SSL certificate for keys and values when using io.confluent.connect.json.JsonSchemaConverter.

    If the setting value is not specified, the SSL certificate does not get verified.

  • unavailable.value.placeholder: Value that replaces data if its type is not supported.

    The default value is __debezium_unavailable_value.

Was the article helpful?

Previous
Data transformation
Next
Yandex Data Transfer specifics for sources and targets
© 2025 Direct Cursus Technology L.L.C.