I’m learning Zookeeper and so far I don't understand the purpose of using it for distributed systems that databases can't solve. The use cases I’ve read are implementing a lock, barrier, etc for distributed systems by having Zookeeper clients read/write to Zookeeper servers. Can’t the same be achieved by read/write to databases? For example my book describes the way to implement a lock with Zookeeper is to have Zookeeper clients who want to acquire the lock create an <code>ephemeral znode</code> with a sequential flag set under the <code>lock-znode</code>. Then the lock is owned by the client whose child znode has the lowest sequence number. All other Zookeeper examples in the book are again just using it to store/retrieve values. It seems the only thing that differs Zookeeper from a database/any storage is the “watcher” concept. But that can be built using something else. I know my simplified view of Zookeeper is a misunderstanding. So can someone tell me what Zookeeper truly provides that a database/custom watcher can’t?

<blockquote> Can’t the same be achieved by read/write to databases? </blockquote> In theory, yes it is possible, but usually, it is not a good idea to use databases for demanding usecases of distributed coordination. I have seen microservices using relational databases for managing distributed locks with very bad consequences (e.g. thousands of deadlocks in the databases) which in turn resulted in poor DBA-developer relation :-) Zookeeper has some key characteristics which make it a good candidate for managing application metadata <ul> <li>Possibility to scale horizontally by adding new nodes to ensemble </li> <li>Data is guaranteed to be eventually consistent within a certain timebound. It is possible to have strict consistency at a higher cost if clients desire it (Zookeeper is a CP system in CAP terms)</li> <li>Ordering guarantee -- all clients are guaranteed to be able to read data in the order in which they have been written</li> </ul> All of the above could be achieved by databases, but only with significant effort from application clients. Also watches and ephemeral nodes could be achieved by databases by using techniques such as triggers, timeouts etc. But they are often considered inefficient or antipatterns. Relational databases offer strong transactional guarantees which usually come at a cost but are often not required for managing application metadata. So it make sense to look for a more specialized solution such as Zookeeper or Chubby. Also, Zookeeper stores all its data in memory (which limits its usecases), resulting in highly performant reads. This is usually not the case with most databases.

What's the purpose of using Zookeeper rather than just databases for managing distributed systems?

Tags:

java

apache-zookeeper

distributed-computing

I’m learning Zookeeper and so far I don't understand the purpose of using it for distributed systems that databases can't solve.

The use cases I’ve read are implementing a lock, barrier, etc for distributed systems by having Zookeeper clients read/write to Zookeeper servers. Can’t the same be achieved by read/write to databases?

For example my book describes the way to implement a lock with Zookeeper is to have Zookeeper clients who want to acquire the lock create an ephemeral znode with a sequential flag set under the lock-znode. Then the lock is owned by the client whose child znode has the lowest sequence number.

All other Zookeeper examples in the book are again just using it to store/retrieve values.

It seems the only thing that differs Zookeeper from a database/any storage is the “watcher” concept. But that can be built using something else.

I know my simplified view of Zookeeper is a misunderstanding. So can someone tell me what Zookeeper truly provides that a database/custom watcher can’t?

254

asked Mar 30 '16 14:03

Glide

3 Answers

Can’t the same be achieved by read/write to databases?

In theory, yes it is possible, but usually, it is not a good idea to use databases for demanding usecases of distributed coordination. I have seen microservices using relational databases for managing distributed locks with very bad consequences (e.g. thousands of deadlocks in the databases) which in turn resulted in poor DBA-developer relation :-)

Zookeeper has some key characteristics which make it a good candidate for managing application metadata

Possibility to scale horizontally by adding new nodes to ensemble
Data is guaranteed to be eventually consistent within a certain timebound. It is possible to have strict consistency at a higher cost if clients desire it (Zookeeper is a CP system in CAP terms)
Ordering guarantee -- all clients are guaranteed to be able to read data in the order in which they have been written

All of the above could be achieved by databases, but only with significant effort from application clients. Also watches and ephemeral nodes could be achieved by databases by using techniques such as triggers, timeouts etc. But they are often considered inefficient or antipatterns.

Relational databases offer strong transactional guarantees which usually come at a cost but are often not required for managing application metadata. So it make sense to look for a more specialized solution such as Zookeeper or Chubby.

Also, Zookeeper stores all its data in memory (which limits its usecases), resulting in highly performant reads. This is usually not the case with most databases.

answered Oct 16 '22 01:10

senseiwu

I think you're asking yourself the wrong question when you try to figure out the purpose of Zookeeper, instead of asking what Zookeeper can do that "databases" can not do (btw Zookeeper is also a database) ask what Zookeeper is better at than other available databases. If you start to ask yourself that question you will hopefully understand why people decide to use Zookeeper in their distributed services.

Take ephemeral nodes for example, the huge benefit of using them is not that they make a much better lock than some other way. The benefit of using ephemeral nodes is that they will automatically be removed if the client loses connection to Zookeeper.

And then we can have a look at the CAP theorem where Zookeeper closest resembles a CP system. And you must once again decide if this is what you want out of your database.

tldr: Zookeeper is better in some aspects and worse in others compared to other databases.

answered Oct 16 '22 02:10

Petter

Late in the party. Just to provide another thought:

Yes, it's quite common to use SQL database for server coordinations in production. However, you will likely be asked to build a HA (high availability) system, right? So your SQL DB will have to be HA. That means you will need the leader-follower architecture (a follower SQL DB), follower will need to be promoted to the leader if the leader dies (MHA nodes + manager), when the previous leader is back to life it must know that it's no longer the leader. These questions have answers but will cost engineer effort to set them up. So Zookeeper is invented.

I sometimes consider Zookeeper as a simplified version of HA SQL cluster with a subset of functionalities.

Similarly, why people choose to use NoSQL VS SQL. With the proper partitioning, SQL can also scale well, right? So why NoSQL. One motivation is to reduce the effort level in case of handling node failures. When a NoSQL node is dead, it can automatically fallback to another node and even trigger the data migration. But if one of your SQL partition leader is died, it usually requires manual treatment. This is like SQL VS Zookeeper. Someone coded up the HA + failover logic for you, so we can lay back, hopefully, in case of inevitable node failures.

answered Oct 16 '22 01:10

SexyNerd

Related questions
                            
                                How to create temporary procedures in MySQL?
                            
                                Display emoji/emotion icon in Android TextView
                            
                                NumberFormatException on valid number String
                            
                                Using PDFbox to determine the coordinates of words in a document
                            
                                Compressing & Decompressing 7z file in java
                            
                                How do I save an entity with foreign key without loading the related entity in JPA?
                            
                                Java HashMap - deep copy
                            
                                How to handle exceptions in JUnit setup methods
                            
                                Warning in JUnit tests
                            
                                Is there a way to query multiple hash keys in DynamoDB?
                            
                                Java: how to ensure serializable collections
                            
                                Long startup delay for Java WebStart application since Java 1.7.0u40
                            
                                How to use UUIDs with Hibernate as a field?
                            
                                Gson how to get serialized name
                            
                                Parallelism and Flatmap in Java 8 Streams
                            
                                How JVM ensures thread safety of memory allocation for a new object
                            
                                Android Google Map addMarker() very slow when adding 400 markers
                            
                                Spring transaction boundary and DB connection holding
                            
                                Java Spring Recreate specific Bean
                            
                                Notification not shown when setGroup() is called in Android KitKat

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With