Managing data schemas
Apache Kafka® uses a binary format for storing and transmitting messages. Messages do not contain any information about their structure. As a result, to interpret data in binary format, the consumer needs a data format schema that describes the format of data input or output.
Based on the data format schema, the producer generates and the consumer interprets messages from topics. If data format schemas are different across producers and consumers, the application may throw errors because of incorrect message interpretation.
This is why the developers on both the producer and consumer side should:
- Update data format schemas regularly and on time.
- Enable the producer and consumer to support multiple data format schema versions, if required.
To automate handling of data format schemas, a data format schema registry is used. It significantly simplifies working with data, especially when a schema changes over time. The registry automatically checks data version compatibility and ensures the backward compatibility of schema versions.
How data format schema registry works
-
A producer transmits data format schemas to the registry. The following data schema formats are supported:
When a schema is placed in the registry:
- It is assigned a unique version number.
- The schema and its version are saved in an Apache Kafka® service topic.
-
When sending a message, a producer specifies the version number of the desired schema.
-
Upon receiving a message, a consumer extracts the version number of the data format schema in it.
-
If the required data format schema is missing from the local cache, the consumer looks it up in the registry. After getting the appropriate schema, it correctly interprets the received message.
Managed Schema Registry
Managed Service for Apache Kafka® clusters already have a built-in Managed Schema Registry data format schema registry. The registry is deployed on each cluster broker host and is accessible via HTTPS on port 443.
The Karapace
Schema information is posted to a service topic named __schema_registry
. You cannot use regular tools to write data to this topic.
To enable management, activate the option when creating or updating a cluster.
To work with Managed Schema Registry, you need an advanced security group configuration.
Managed Schema Registry subjects
The schemas use subjects<topic_name>-key
or <topic_name>-value
subjects depending on whether the schema is registered for a key or a value. The subject specifies the topic to publish messages to.
Subject access depends on permissions granted to the Apache Kafka® user:
- The
ACCESS_ROLE_CONSUMER
orACCESS_ROLE_PRODUCER
role for a specific topic allows the user to manage these subjects:<topic_name>-key
,<topic_name>-value
or<topic_name>
. - The
ACCESS_ROLE_CONSUMER
orACCESS_ROLE_PRODUCER
role for the<prefix>*
topic allows the user to manage subjects of the same format:<prefix>*
. Topic and subject names start with the same prefix. - The
ACCESS_ROLE_ADMIN
role allows the user to manage all subjects in a Managed Service for Apache Kafka® cluster.
Authorization in Managed Schema Registry
When working with the Managed Schema Registry API over an SSL connection, you need to configure the same client SSL certificate as for broker host connections.
You also need to authorize API server requests using the Authorization
HTTP header
Access to schemas depends on the selected topic management method and the configured user roles:
-
When using managed topics:
- A user with the
ACCESS_ROLE_PRODUCER
role for a topic can perform any operations with subjects associated with that topic. - A user with the
ACCESS_ROLE_CONSUMER
role for a topic can perform read operations with subjects associated with the topic.
For more information on available subjects, see Managed Schema Registry subjects.
- A user with the
-
When using unmanaged topics:
- The above points mentioned for a cluster with managed topics also apply.
- In addition, a user with the
ACCESS_ROLE_ADMIN
role for a topic has access to any operations with subjects related to the topic. This user can be granted access to any topics.
For more information about roles, see User management.
Confluent Schema Registry
Confluent Schema Registry
Confluent Schema Registry allows you to store data format schemas in the Apache Kafka® service topic named _schemas
.
For more information about the registry, see the Confluent documentation