Serialization
Serialization is the conversion of data objects to a bit sequence when transferring data to targets that work with raw
data. These targets include:
You can set up serialization when creating or updating a target endpoint.
Serialization on delivery to Object Storage
When delivering to Object Storage, you can select Serialization format: JSON
, CSV
, PARQUET
, or Raw data
. For JSON
, the Convert complex data to strings setting is available.
The output data format depends both on the Serialization format setting selection and the type and settings of source endpoint conversion rules.
See below how output data differs if no conversion rules are set for the source endpoint.
Note
There are no examples for PARQUET
output data, since this format is binary.
Yandex Data Streams
Input data: Two messages:
Text string
{"device_id":"iv9,"speed":"5"}
Output data:
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,Text string
<stream_name>,<segment_key>,<message_sequence_number>,<data_recording_date_and_time>,"{""device_id"":""iv9"",""speed"":5}"
{"data":"Text string","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
{"data":"{\"device_id\":\"iv9\",\"speed\":5}","partition":<segment_key>,"seq_no":<message_sequence_number>,"topic":"<stream_name>","write_time":"<data_recording_date_and_time>"}
Text string
{"device_id":"iv9,"speed":"5"}
Managed Service for PostgreSQL
Input data: Table:
device_id | speed |
---|---|
iv9 | 5 |
rhi | 10 |
Output data:
{"device_id":"iv9","speed":5}
{"device_id":"rhi","speed":10}
iv9,5,
rhi,10,
This is not supported.
Serialization at data delivery to message queues
When delivering data to a message queue, you can use two types of serialization:
Auto
Automatic selection of serialization settings depending on the source type.
Debezium
Debezium
-
dt.add.original.type.info: Determines whether to add information about the original types of data to restore the type after the transfer.
Exception: PostgreSQL
with time zone
date and time data types. Time zone information cannot be restored.The default value is
false
. -
dt.mysql.timezone: Time zone for MySQL® date and time data types in IANA
format.The default value is
UTC
. -
dt.unknown.types.policy: Policy that determines the behavior for handling user-defined data types.
The possible values are:
skip
: Do not abort the transfer and ignore user-defined data types.to_string
: Do not abort the transfer and convert user-defined data types to text.fail
: Abort the transfer and return an error.
The default value is
skip
. -
decimal.handling.mode: Mode for handling real numbers.
The possible values are:
precise
: Precise conversion using thejava.math.BigDecimal
method.double
: Conversion to adouble
data type. This may result in precision loss.string
: Conversion to text.
The default value is
precise
. -
interval.handling.mode: Mode for handling time intervals.
The possible values are:
numeric
: Approximate conversion to microseconds.string
: Precise conversion based on the string template:P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S
.
The default value is
numeric
. -
key.converter and value.converter: Key and value converters.
The possible values are:
org.apache.kafka.connect.json.JsonConverter
: JSON, standard for Debezium .io.confluent.connect.json.JsonSchemaConverter
: Confluent Schema Registry .
The default value is
org.apache.kafka.connect.json.JsonConverter
. -
key.converter.schemas.enable and value.converter.schemas.enable: Whether to add a schema description to each message for keys and values when using
org.apache.kafka.connect.json.JsonConverter
.The default value is
true
. -
key.converter.schema.registry.url and value.converter.schema.registry.url: Whether to add a schema description to each message for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.The possible values are:
- Empty string (default): Do not add a schema description.
- URL string value defining the path to the schema registry service.
-
key.converter.basic.auth.user.info and value.converter.basic.auth.user.info: Username and password for authorization in Confluent Schema Registry for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.Value format:
<username>:<password>
. -
key.converter.ssl.ca and value.converter.ssl.ca: Contents of Confluent Schema Registry's SSL certificate for keys and values when using
io.confluent.connect.json.JsonSchemaConverter
.If the setting value is not specified, the SSL certificate does not get verified.
-
unavailable.value.placeholder: Value that replaces data if its type is not supported.
The default value is
__debezium_unavailable_value
.