Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what's the difference of 'majority committed data' and 'the snapshot of majority committed data'

What's the difference between majority committed data and snapshot of majority committed data and should I care about it? If the two concepts are totally different, when and how to choose one over the other?

I found these descriptions while I was reading the reference about transactions in mongodb: https://docs.mongodb.com/manual/core/transactions/#transaction-options-read-concern-write-concern-read-preference, but I cannot understand the difference between readConcern: majority and readConcern: snapshot.

like image 861
gaols Avatar asked Dec 24 '18 02:12

gaols


People also ask

What is snapshot isolation in MongoDB?

Snapshot isolation refers to transactions seeing a consistent view of data: transactions can read data from a “snapshot” of data committed at the time the transaction starts. Any conflicting updates will cause the transaction to abort.

What is default read concern in MongoDB?

Default Read ConcernReads against secondaries. "local" Note. This read concern can return data that may be rolled back. This read concern does not guarantee causal consistency.

What is read Concern majority?

"majority" For read operations not associated with multi-document transactions, read concern "majority" guarantees that the data read has been acknowledged by a majority of the replica set members (i.e. the documents read are durable and guaranteed not to roll back).

What is read concern?

The readConcern option allows you to control the consistency and isolation properties of the data read from replica sets and replica set shards.


1 Answers

Unfortunately the documentation really takes these concepts for granted, but the difference is not that easy to understand. I am not even sure I got it right, but Aly Cabral made this example in her talk about distributed transactions.

If I specify readConcern: snapshot, then I am going to have a consistent point in time across all of the shards.

enter image description here

Instead if I specify readConcern: local or readConcern: majority, then I am going to have consistent snapshots per partition.

enter image description here

And she concludes by saying that with readConcern: snapshot, coordinating the snapshot finding across the sharded cluster could be expensive, so you should weigh the pros and cons of each readConcern.


She made a good example and those images really helped me, but I think an addition is necessary:

  • the time field shown refers to the ClusterTime1: this is a cluster-wide logical clock based on the Hybrid Logical Clock. The primary of each shard has its own ClusterTime value and adheres to the following rules:

    ClusterTime Increment rule: The ClusterTime is incremented (“ticks”) only when there is a write to a primary node’s replication operation log (oplog). ClusterTime Distribution rule: Cluster nodes (mongod, mongos, config server, clients) always track and include the greatest known ClusterTime when sending a message.

    If in the second picture the ClusterTime of the transaction is chosen to be 102, the shard having ClusterTime 110 could have majority-committed changes that I would not want to see during the transaction. From MongoDB 5.0, this time is configurable:

    readConcern: {
        level: "snapshot",
        atClusterTime: Timestamp(1613577600, 1)
    }
    

I am writing a report on MongoDB transactions where I have explained these concepts and many others. If you find errors/things to improve, I would really appreciate anyone's help.


1: https://dl.acm.org/doi/pdf/10.1145/3299869.3314049

like image 130
Marco Luzzara Avatar answered Oct 13 '22 15:10

Marco Luzzara