I have been having trouble finding an example of what use cases are suitable for Vector Clocks and Version Vectors, and how they might differ. I understand that they largely work in the same way, with Vector Clocks using receive
and send
functions, and Version Vectors using a sync
function instead, but I do not understand the differences between the two options. Is it just two different ways of expressing the same thing, or are there real differences in use cases between them?
I was only able to find one question that was somewhat related: "When do I use a consensus algorithm like Paxos vs using a something like a Vector Clock?"
Even though the linked answer states the following, and references a short article, the differences are still unclear to me.
You might want to use a version vector for a leaderless distributed storage. You might use vector clocks for the same (although it's a worse fit; the article also suggests you use it for consistent snapshots, for implementing causal ordering in general distributed systems etc).
A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's logical clock.
Vector Clocks represent an extension of Lamport Timestamps in that they guarantee the strong clock consistency condition which (additionally to the clock consistency condition) dictates that if one event's clock comes before another's, then that event comes before the other, i.e., it is a two-way condition.
Comparing vector timestamps Vectors are compared by comparing their values element by element. That is, we compare the values of P0, then P1, etc. Two vector timestamps are equal if each corresponding element of one vector is the same as the other.
A: Vector clock is not a snapshot algorithm. It can used to reason about which messages are in transit and save them, and you can still get consistency if you want. But since they still need to keep the state of memory and the state of network, marker algorithm is a much simper way to do this.
Same question here, and it's still not absolutely clear to me, but what I've found is that version vectors are more suitable to determine the causality of events in a specific network of replicated nodes in a distributed system, where the only thing you are interested in is what happened first and what happened after.
By contrast, a vector clock determines event order in an undetermined sequence of events in a distributed system.
In that sense, using integers for version vectors is overly complicated, because if we just want to determine which node, A or B, is more updated, given a situation where initially A[2,2] and B[2,2] (therefore in sync).
From the version vector perspective, A[3,2] > B[2,2] means the same as A[10,2] > B[2,2]. That would explain why we can use a fixed set of values for version vectors and the only important operation is just sync versions.
From the vector clock perspective, there is a difference between A[10,2] and A[3,2]. It means that +7 events happened in the meantime. That would explain why we need to keep track of all the events and there are send and receive operations to sync all the vector clocks in the network.
Anyways, I'm missing like you some clear document that explains clearly the difference and the usages of one compared to the other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With