When running a web application in a farm that uses a distributed datastore that's eventually consistent (CouchDB in my case), should I be ensuring that a given user is always directed to same the datastore instance?
It seems to me that the alternate approach, where any web request can use any data store, adds significant complexity to deal with consistency issues (retries, checks, etc). On the other hand, if a user in a given session is always directed to the same couch node, won't my consistency issues revolve mostly around "shared" user data and thus be greatly simplified?
I'm also curious about strategies for directing users but maybe I'll keep that for another question (comments welcome).
Data Nodes are plug-n-play and can be added to a deployment at any time. Data Nodes integrate seamlessly with existing deployments. Use Data Nodes to reduce the processing load on processor appliances by removing the data storage processing load from the processor.
A key benefit of an eventually consistent database is that it supports the high availability model of NoSQL. Eventually consistent databases prioritize availability over strong consistency.
Eventual consistency and strong consistency are two design and architecture choices for how to deal with distributed systems at their edge cases - when they malfunction. Distributed storage systems have three desirable qualities:
The nodes communicate with each other through network and data replication protocols that are specific to the database architecture. Each replica has a copy of (all) the data. And therefore has the resources to serve all the read and write requests client applications send to that node.
According to the CAP Theorem, distributed systems can either have complete consistency (all nodes see the same data at the same time) or availability (every request receives a response). You'll have to trade one for the other during a partition or datastore instance failure.
Should I be ensuring that a given user is always directed to same the datastore instance?
Ideally, you should not! What will you do when the given instance fails? A major feature of a distributed datastore is to be available in spite of network or instance failures.
If a user in a given session is always directed to the same couch node, won't my consistency issues revolve mostly around "shared" user data and thus be greatly simplified?
You're right, the architecture would be a lot more simpler that way, but again, what would you do if that instance fails? A lot of engineering effort has gone into distributed systems to allow multiple instances to reply to a query. I am not sure about CouchDB, but Cassandra allows you to choose your consistency model, you'll have to tradeoff availability for higher degree of consistency. The client is configured to request servers in a round-robin fashion by default, which distributes the load.
I would recommend you read the Dynamo paper. The authors describe a lot of engineering details behind a distributed database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With