Managing data schemas
Apache Kafka® uses a binary format for storing and transmitting messages. Messages do not contain any information about their structure. As a result, to interpret data in binary format, the consumer needs a data format schema that describes the format of data input or output.
Based on the data format schema, the producer generates and the consumer interprets messages from topics. If data format schemas are different across producers and consumers, the application may throw errors because of incorrect message interpretation.
This is why the developers on both the producer and consumer side should:
- Update data format schemas regularly and on time.
- Enable the producer and consumer to support multiple data format schema versions, if required.
To automate handling of data format schemas, a data format schema registry is used. It significantly simplifies working with data, especially when a schema changes over time. The registry automatically checks data version compatibility and ensures the backward compatibility of schema versions.
How data format schema registry works
-
A producer transmits data format schemas to the registry. The following data schema formats are supported:
When a schema is placed in the registry:
- It is assigned a unique version number.
- The schema and its version are saved in an Apache Kafka® service topic.
-
When sending a message, a producer specifies the version number of the desired schema.
-
Upon receiving a message, a consumer extracts the version number of the data format schema in it.
-
If the required data format schema is missing from the local cache, the consumer looks it up in the registry. After getting the appropriate schema, it correctly interprets the received message.
Managed Schema Registry
Managed Service for Apache Kafka® clusters already have a built-in Managed Schema Registry data format schema registry. The registry is deployed on each cluster broker host and is accessible via HTTPS on port 443.
The Karapace
Karapace is deployed on each broker host on a separate port, with its own endpoint available for connection. When you delete a broker, the corresponding endpoint becomes unavailable.
Schema information is posted to a service topic named __schema_registry. You cannot use regular tools to write data to this topic.
To enable management, activate the option when creating or updating a cluster.
To work with Managed Schema Registry, you need an advanced security group configuration.
Managed Schema Registry subjects
The schemas use subjects<topic_name>-key or <topic_name>-value subjects, depending on whether the schema is registered for a key or a value. The subject specifies the topic to publish messages to.
Subject access depends on permissions granted to the Apache Kafka® user:
- With the
ACCESS_ROLE_SCHEMA_READERorACCESS_ROLE_SCHEMA_WRITERrole for particular subjects, the user can manage only these subjects. - With the
ACCESS_ROLE_CONSUMERorACCESS_ROLE_PRODUCERrole for a particular topic, the user can manage the following subjects:<topic_name>-key,<topic_name>-value, and<topic_name>. - With the
ACCESS_ROLE_CONSUMERorACCESS_ROLE_PRODUCERrole for a topic formatted as<prefix>*, the user can manage subjects of the same<prefix>*format. Topic and subject names start with the same prefix. - With the
ACCESS_ROLE_TOPIC_ADMINrole for a topic formatted as<prefix>*, the user can manage subjects of the same<prefix>*format. Topic and subject names start with the same prefix. - The
ACCESS_ROLE_ADMINrole allows the user to manage all subjects in the Managed Service for Apache Kafka® cluster.
Learn more about the permissions you get with each role.
Authorization in Managed Schema Registry
When working with the Managed Schema Registry API over an SSL connection, you need to configure the same client SSL certificate as for broker host connections.
You also need to authorize API server requests using the Authorization HTTP header
Access to schemas depends on the selected topic management method and the configured user roles:
-
When using managed topics:
- A user with the
ACCESS_ROLE_PRODUCERrole for a topic can perform any operations with subjects associated with that topic. - A user with the
ACCESS_ROLE_CONSUMERrole for a topic can perform read operations with subjects associated with the topic.
For more information on available subjects, see Subjects in Managed Schema Registry.
- A user with the
-
When using unmanaged topics:
- The above points mentioned for a cluster with managed topics also apply.
- In addition, a user with the
ACCESS_ROLE_ADMINrole for a topic has access to any operations with subjects related to the topic. This user can be granted access to any topics.
For more information about roles, see User management.
Use cases
- Managing data schemas in Managed Service for Apache Kafka®
- Working with the managed schema registry
- Working with the managed data format schema registry via the REST API
Confluent Schema Registry
Confluent Schema Registry
Confluent Schema Registry allows you to store data format schemas in the Apache Kafka® service topic named _schemas.
For more information about the registry, see the Confluent documentation