Managing data schemas

Written by

Updated at December 4, 2024

How data format schema registry works
Managed Schema Registry
Managed Schema Registry subjects
Authorization in Managed Schema Registry
Confluent Schema Registry
See also

Apache Kafka® uses a binary format for storing and transmitting messages. Messages do not contain any information about their structure. As a result, to interpret data in binary format, the consumer needs a data format schema that describes the format of data input or output.

Based on the data format schema, the producer generates and the consumer interprets messages from topics. If data format schemas are different across producers and consumers, the application may throw errors because of incorrect message interpretation.

This is why the developers on both the producer and consumer side should:

Update data format schemas regularly and on time.
Enable the producer and consumer to support multiple data format schema versions, if required.

To automate handling of data format schemas, a data format schema registry is used. It significantly simplifies working with data, especially when a schema changes over time. The registry automatically checks data version compatibility and ensures the backward compatibility of schema versions.

How data format schema registry works

A producer transmits data format schemas to the registry. The following data schema formats are supported:
- Avro.
- JSON Schema.
- Protobuf.
When a schema is placed in the registry:
- It is assigned a unique version number.
- The schema and its version are saved in an Apache Kafka® service topic.
When sending a message, a producer specifies the version number of the desired schema.
Upon receiving a message, a consumer extracts the version number of the data format schema in it.
If the required data format schema is missing from the local cache, the consumer looks it up in the registry. After getting the appropriate schema, it correctly interprets the received message.

Managed Schema Registry

Managed Service for Apache Kafka® clusters already have a built-in Managed Schema Registry data format schema registry. The registry is deployed on each cluster broker host and is accessible via HTTPS on port 443.

The Karapace open-source tool is used as a Managed Schema Registry implementation. The Karapace API is compatible with the Confluent Schema Registry API with only minor exceptions. To run API requests, you need authentication.

Schema information is posted to a service topic named __schema_registry. You cannot use regular tools to write data to this topic.

To enable management, activate the option when creating or updating a cluster.

To work with Managed Schema Registry, you need an advanced security group configuration.

Managed Schema Registry subjects

The schemas use subjects, i.e., names they are registered under. To write and read schemas, Apache Kafka® uses the <topic_name>-key or <topic_name>-value subjects, depending on whether the schema is registered for a key or a value. The subject specifies the topic to publish messages to.

Subject access depends on permissions granted to the Apache Kafka® user:

The ACCESS_ROLE_CONSUMER or ACCESS_ROLE_PRODUCER role for a specific topic allows the user to manage these subjects: <topic_name>-key, <topic_name>-value, and <topic_name>.
The ACCESS_ROLE_CONSUMER or ACCESS_ROLE_PRODUCER role for a topic formatted as <prefix>* allows the user to manage subjects with the same <prefix>* format. Topic and subject names start with the same prefix.
The ACCESS_ROLE_ADMIN role allows the user to manage all subjects in a Managed Service for Apache Kafka® cluster.

Authorization in Managed Schema Registry

When working with the Managed Schema Registry API over an SSL connection, you need to configure the same client SSL certificate as for broker host connections.

You also need to authorize API server requests using the Authorization HTTP header. In this header, specify the username and password of the Apache Kafka® user.

Access to schemas depends on the selected topic management method and the configured user roles:

When using managed topics:
- A user with the ACCESS_ROLE_PRODUCER role for a topic can perform any operations with subjects associated with that topic.
- A user with the ACCESS_ROLE_CONSUMER role for a topic can perform read operations with subjects associated with the topic.
For more information on available subjects, see Subjects in Managed Schema Registry.
When using unmanaged topics:
- The above points mentioned for a cluster with managed topics also apply.
- In addition, a user with the ACCESS_ROLE_ADMIN role for a topic has access to any operations with subjects related to the topic. This user can be granted access to any topics.

For more information about roles, see User management.

Confluent Schema Registry

Confluent Schema Registry is one of the software solutions that helps resolve the issue of data format schema synchronization between producers and consumers.

Confluent Schema Registry allows you to store data format schemas in the Apache Kafka® service topic named _schemas.

For more information about the registry, see the Confluent documentation.

Managing data schemas

How data format schema registry worksHow data format schema registry works

Managed Schema RegistryManaged Schema Registry

Managed Schema Registry subjectsManaged Schema Registry subjects

Authorization in Managed Schema RegistryAuthorization in Managed Schema Registry

Confluent Schema RegistryConfluent Schema Registry

See alsoSee also

Was the article helpful?

How data format schema registry works

Managed Schema Registry

Managed Schema Registry subjects

Authorization in Managed Schema Registry

Confluent Schema Registry

See also