Serialization
Serialization is the conversion of data objects to a bit sequence when transferring data to targets that work with raw
data. These targets include:
You can set up serialization when creating or updating a target endpoint.
Serialization on delivery to Object Storage
When delivering to Object Storage, you can select Serialization format: JSON
, CSV
, PARQUET
, or Raw data
. For JSON
, the Convert complex data to strings setting is available.
The output data format depends both on the Serialization format setting selection and the type and settings of source endpoint conversion rules.
See below how output data differs if no conversion rules are set for the source endpoint.
Note
There are no examples for PARQUET
output data, since this format is binary.
Yandex Data Streams
Input data: Two messages:
Text string
{"device_id":"iv9,"speed":"5"}
Output data:
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,Text string
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,"{""device_id"":""iv9"",""speed"":5}"
{"data":"Text string","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
{"data":"{\"device_id\":\"iv9\",\"speed\":5}","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
Text string
{"device_id":"iv9,"speed":"5"}
Managed Service for PostgreSQL
Input data: Table:
device_id | speed |
---|---|
iv9 | 5 |
rhi | 10 |
Output data:
{"device_id":"iv9","speed":5}
{"device_id":"rhi","speed":10}
iv9,5,
rhi,10,
This is not supported.
Serialization at data delivery to message queues
When delivering data to a message queue, you can use two types of serialization:
Auto
Automatic selection of serialization settings depending on the source type.
Debezium
Debezium
-
dt.add.original.type.info: Determines whether to add information about the original types of data to restore the type after the transfer.
Exception: PostgreSQL
with time zone
date and time data types. Time zone information cannot be restored.The default value is
false
. -
dt.batching.max.size: Maximum message size in bytes.
The default value is
0
bytes (batching is disabled). The recommended value is1048576
bytes (1 MB). A non-empty value enables batching.The setting is relevant if you use JSON serialization with Schema Registry (see the key.converter and value.converter parameters). If using Schema Registry, the enqueued messages may become very small. In such a case, batching increases the throughput of queues.
When batching is enabled, a single queue message will comprise a sequence of logical messages in Confluent wire format
. Data serialized in this way can be easily decoded.When batching is enabled, messages are aggregated in a buffer. If a new message leads to exceeding the
dt.batching.max.size
value, the current buffer is retained and the new message is added to an empty buffer. If one logical message from the source exceeds thedt.batching.max.size
value, a batch consisting of this single message will be created. Batching takes place before compression in the queue client.Enabling batching can be useful to optimize heavy delivery to a queue where messages are read by a transfer.
Warning
Only a transfer can decode batched messages.
-
dt.mysql.timezone: Time zone for MySQL® date and time data types in IANA
format.The default value is
UTC
. -
dt.unknown.types.policy: Policy that determines the behavior for handling user-defined data types.
The possible values are:
skip
: Do not abort the transfer and ignore user-defined data types.to_string
: Do not abort the transfer and convert user-defined data types to text.fail
: Abort the transfer and return an error.
The default value is
skip
. -
decimal.handling.mode: Mode for handling real numbers.
The possible values are:
precise
: Precise conversion using thejava.math.BigDecimal
method.double
: Conversion to adouble
data type. This may result in precision loss.string
: Conversion to text.
The default value is
precise
. -
interval.handling.mode: Mode for handling time intervals.
The possible values are:
numeric
: Approximate conversion to microseconds.string
: Precise conversion based on the string template:P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S
.
The default value is
numeric
. -
key.converter and value.converter: Key and value converters.
The possible values are:
org.apache.kafka.connect.json.JsonConverter
: JSON, standard for Debezium .io.confluent.connect.json.JsonSchemaConverter
: Confluent Schema Registry .
The default value is
org.apache.kafka.connect.json.JsonConverter
. -
key.converter.schemas.enable and value.converter.schemas.enable: Whether to add a schema description to each message for keys and values when using
org.apache.kafka.connect.json.JsonConverter
.The default value is
true
. -
key.converter.schema.registry.url and value.converter.schema.registry.url: Whether to add a schema description to each message for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.The possible values are:
- Empty string (default): Do not add a schema description.
- URL string value defining the path to the schema registry service.
-
key.converter.dt.json.generate.closed.content.schema and value.converter.dt.json.generate.closed.content.schema: Determine whether to use the closed content model to generate the data producer schema for the key and value. This enables performing compatibility checks by converting the consumer open model to the closed one and searching for a similar schema among those registered for the producer.
The default value is
false
.To maintain full transitive compatibility when adding or removing optional fields in the key schema:
- Select the
Optional-friendly
compatibility check policy in the Schema Registry namespace. - In serialization settings of the Managed Service for Apache Kafka® target endpoint, set key.converter.dt.json.generate.closed.content.schema to
true
.
To maintain full transitive compatibility when adding or removing optional fields in the value schema:
- Select the
Optional-friendly
compatibility check policy in the Schema Registry namespace. - In serialization settings of the target endpoint, set value.converter.dt.json.generate.closed.content.schema to
true
.
- Select the
-
key.converter.basic.auth.user.info and value.converter.basic.auth.user.info: Username and password for authorization in Confluent Schema Registry for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.Value format:
<username>:<password>
. -
key.converter.ssl.ca and value.converter.ssl.ca: Contents of Confluent Schema Registry's SSL certificate for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.If the setting value is not specified, the SSL certificate does not get verified.
-
unavailable.value.placeholder: Value that replaces data if its type is not supported.
The default value is
__debezium_unavailable_value
.