I am using Kafka Connect from Confluent to consume Kafka stream and write to HDFS in parquet format. I am using Schema Registry service in 1 node and it is running fine. Now I want to distribute Schema Registry to cluster mode to handle fail over. Any link or snippet on how to achieve that will be very useful.
It is hard to find, but we covered this architecture in our documentation: http://docs.confluent.io/3.0.0/schema-registry/docs/deployment.html#multi-dc-setup
To quote from the docs a bit (although you should read the docs, lots of good architecture advice and a recovery runbook are included):
Assuming you have Schema Registry running, here are the recommended steps to add Schema Registry instances in a new “slave” datacenter (call it DC B):
In DC B, make sure Kafka has unclean.leader.election.enable set to false. In Kafka in DC B, create the _schemas topic. It should have 1 partition, kafkastore.topic.replication.factor of 3, and min.insync.replicas at least 2. In DC B, run MirrorMaker with Kafka in the “master” datacenter as the source and Kafka in DC B as the target. In the Schema Registry config files in DC B, set kafkastore.connection.url and schema.registry.zk.namespace to match the instances already running, and set master.eligibility to false. Start your new Schema Registry instances with these configs.
I used confluent schema-registry docker image to form the cluster.
docker run --restart always -d -p 8081:8081 --name=schema-registry-1 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-1 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-2 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-2 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-3 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-3 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
Once this is up and running I verified if schema-registry cluster is formed and if its leader election is successful, by checking the zookeeper contents.
$ docker exec -it zookeeper bash
# /usr/bin/zookeeper-shell localhost:2181
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[schema_registry, cluster, controller, brokers, zookeeper, admin, isr_change_notification, log_dir_event_notification, controller_epoch, kafka-manager, CruiseControlBrokerList, consumers, latest_producer_id_block, config]
[zk: localhost:2181(CONNECTED) 1] ls /schema_registry
[schema_registry_master, schema_id_counter]
[zk: localhost:2181(CONNECTED) 4] get /schema_registry/schema_registry_master
{"host":"schema-registry-1","port":8081,"master_eligibility":true,"scheme":"http","version":1}
#
Hope this helps.
You just need to put this in the connect-avro-distributed.properties to use multi schema registry:
key.converter.schema.registry.url=http://node1:8081,http://node2:8081
value.converter.schema.registry.url=http://node1:8081,http://node2:8081
Hope this is useful for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With