I am considering a noSQL solution for a current project, but I'm hesitant about the 'eventual consistency' clause in many of these databases. Is eventual consistency different than dealing with a mySQL database where replication lags? One solution I have used in the past with lagging replication is to read from the master when immediate data consistency is needed.
However, I am confused then as to why relational database claim to have strong data consistency. I guess I should use transactions and that will give me strong consistency. Is it a good practice then to write applications assuming mySQL replication may lag?
Single-threaded MySQL replication is sequentially consistent. On the source instance, events are written to the binary log in the order they are committed. The replica replays them in exactly the same order, whether that's necessary for data integrity or not.
The problems with replication are: Having multiple copies may lead to consistency problems.
MySQL Cluster Manager guarantees eventual consistency among agents, meaning that: Any message communicated among agents is either delivered or not delivered to ALL agents (instead of delivered to some and missed by others).
Replication in MySQL is pretty quick to get the data to the slave (quicker than you'll be able to run the UPDATE on the master, and switch to another window to run a SELECT on the slave, if (and only if) the network connections are all up and everything's running OK.
Consistency in the sense it is used in ACID means that all constraints are satisfied before and after any change. When a system assures that you can't read data that is inconsistent, they're saying for example that you will never read data where a child row references a non-existent parent row, or where half of a transaction has been applied but the other half hasn't yet been applied (the textbook example is debiting one bank account but not yet having credited the recipient bank account).
Replication in MySQL is asynchronous by default, or "semi-synchronous" at best. Certainly it does lag in either case. In fact, the replication replica is always lagging behind at least a fraction of a second, because the master doesn't write changes to its binary log until the transaction commits, then the replica has to download the binary log and relay the event.
But the changes are still atomic. You can't read data that is partially changed. You either read committed changes, in which case all constraints are satisfied, or else the changes haven't been committed yet, in which case you see the state of data from before the transaction began.
So you might temporarily read old data in a replication system that lags, but you won't read inconsistent data.
Whereas in an "eventually consistent" system, you might read data that is partially updated, where the one account has been debited but the second account has not yet been credited. So you can see inconsistent data.
You're right that you may need to be careful about reading from replicas if your application requires absolutely current data. Each application has a different tolerance for replication lag, and in fact within one application, different queries have different tolerance for lag. I did a presentation about this: Read/Write Splitting for MySQL and PHP (Percona webinar 2013)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With