What's the difference between majority committed data and snapshot of majority committed data and should I care about it? If the two concepts are totally different, when and how to choose one over the other?
I found these descriptions while I was reading the reference about transactions in mongodb: https://docs.mongodb.com/manual/core/transactions/#transaction-options-read-concern-write-concern-read-preference, but I cannot understand the difference between readConcern: majority
and readConcern: snapshot
.
Snapshot isolation refers to transactions seeing a consistent view of data: transactions can read data from a “snapshot” of data committed at the time the transaction starts. Any conflicting updates will cause the transaction to abort.
Default Read ConcernReads against secondaries. "local" Note. This read concern can return data that may be rolled back. This read concern does not guarantee causal consistency.
"majority" For read operations not associated with multi-document transactions, read concern "majority" guarantees that the data read has been acknowledged by a majority of the replica set members (i.e. the documents read are durable and guaranteed not to roll back).
The readConcern option allows you to control the consistency and isolation properties of the data read from replica sets and replica set shards.
Unfortunately the documentation really takes these concepts for granted, but the difference is not that easy to understand. I am not even sure I got it right, but Aly Cabral made this example in her talk about distributed transactions.
If I specify readConcern: snapshot
, then I am going to have a consistent point in time across all of the shards.
Instead if I specify readConcern: local
or readConcern: majority
, then I am going to have consistent snapshots per partition.
And she concludes by saying that with readConcern: snapshot
, coordinating the snapshot finding across the sharded cluster could be expensive, so you should weigh the pros and cons of each readConcern
.
She made a good example and those images really helped me, but I think an addition is necessary:
the time
field shown refers to the ClusterTime
1: this is a cluster-wide logical clock based on the Hybrid Logical Clock. The primary of each shard has its own ClusterTime
value and adheres to the following rules:
ClusterTime Increment rule: The ClusterTime is incremented (“ticks”) only when there is a write to a primary node’s replication operation log (oplog). ClusterTime Distribution rule: Cluster nodes (mongod, mongos, config server, clients) always track and include the greatest known ClusterTime when sending a message.
If in the second picture the ClusterTime
of the transaction is chosen to be 102, the shard having ClusterTime
110 could have majority-committed changes that I would not want to see during the transaction. From MongoDB 5.0, this time is configurable:
readConcern: {
level: "snapshot",
atClusterTime: Timestamp(1613577600, 1)
}
I am writing a report on MongoDB transactions where I have explained these concepts and many others. If you find errors/things to improve, I would really appreciate anyone's help.
1: https://dl.acm.org/doi/pdf/10.1145/3299869.3314049
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With