Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Zookeeper and a managed replicated database service

I just came across Zookeeper and am wondering as to what's the difference between Zookeeper and an available, consistent, durable, distributed, replicated database service like AWS DynamoDB or even AWS S3(storage service) for that matter. The key features like configuration management, distributed synchronization etc can very well be achieved with a database offering like AWS DynamoDB. I understand that there would be architectural differences between Zookeeper and products like DynamoDB. But, from a feature perspective, are there any major differences between the two ? Is there any reason to use Zookeeper over the other.

like image 827
roger Avatar asked May 20 '18 18:05

roger


People also ask

Can ZooKeeper be used as a database?

ZooKeeper Components shows the high-level components of the ZooKeeper service. With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of the components. The replicated database is an in-memory database containing the entire data tree.

Is ZooKeeper in-memory database?

The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers.

What is ZooKeeper used for?

What is ZooKeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

Is ZooKeeper a distributed system?

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.


1 Answers

Zookeeper in a nutshell if a distributed kernel, it provides low primitives using which you can build complex DISTRIBUTED SYSTEMS further.

1) Zookeeper provides ordered messages, which is very required for distributed locks(distributes systems in general). Dynamo db does not provide ordered message per client guarantee.

2) Sequential znode provide atomic way to add elements in a ordered way with a common prefix string. Combined with Ephemeral nodes and ordered notification they let you create notification.

lets say you want to lock a customerABCD to perform a work, every machine can write Create('/customerABCD/lock-', Sequential) if there are 2 nodes performing above Create then znodes formed will be /customerABCD/lock-1 & /customerABCD/lock-2.

To decide who is leader you can simple query Get('/customerABCD') key and then decide leader with least key value. Now lets say Node which created lock-1 dies, then lock-2 will get notification message from zookeeper and then it can claim ownership of customerABCD. More examples of such distributed tasks are in https://learning.oreilly.com/library/view/zookeeper/9781449361297/ch02.html

In Dynamo machine which created /customerABCD/lock-2 znode will have to poll to know if lock exists or not. This is slow way to acquire lock as it requires timeout based polling, this is inefficient as compute is required to perform poll as well, and adds polling load to system as well.

3) when znodes are added/removed then zxid version gets incremented. This forms the basis of versioning which can be used by distributed systems to achieve lock with fencing as explained in "Making the lock safe with fencing" in link https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

Again Dynamo does not seems to have similar auto-increment parent sequence number facility.

like image 62
WebServer Avatar answered Sep 21 '22 02:09

WebServer