We are looking at a document db storage solution with fail over clustering, for some read/write intensive application.
We will be having an average of 40K concurrent writes per second written to the db (with peak can go up to 70,000 during) - and may have around almost similiar number of reads happening.
We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level).
What will be a good option in terms of a proper choice of document db and related capacity planning?
Updated
More details on the expectation.
CouchDB accepts queries via a RESTful HTTP API, while MongoDB uses its own query language. CouchDB prioritizes availability, while MongoDB prioritizes consistency. MongoDB has a much larger user base than CouchDB, making it easier to find support and hire employees for this database solution.
Sharded clusters in MongoDB are another way to potentially improve performance. Like replication, sharding is a way to distribute large data sets across multiple servers. Using what's called a shard key, developers can copy pieces of data (or “shards”) across multiple servers.
MongoDB performed better than Couchbase on a 4-node cluster, but lower on 10- and 20-node clusters. Cassandra did not perform so well with 2,570 ops/sec on a 4-node cluster, 4,230 ops/sec on a 10-node cluster, and 6,563 ops/sec on a 20-node cluster.
MongoDB and CouchDB both are the types of document-based NoSQL databases. A document database is also called mdocument store, and they are usually used to store the document format of the semi-structured data and detailed description of it.
if "20,000 concurrent writes" means inserts then I would go for CouchDB and use "_changes" api for triggers. But with 20.000 writes you would need a stable sharding aswell. Then you would better take a look at bigcouch
And if "20.000" concurrent writes consist "mostly" updates I would go for MongoDB for sure, since Its "update in place" is pretty awesome. But then you should handle triggers manually, but using another collection to update in place a general document can be a handy solution. Again be careful about sharding.
Finally I think you cannot select a database with just concurrency, you need to plan the api (how you would retrieve data) then look at options in hand.
I would recommend MongoDB. My requirements wasn't nearly as high as yours but it was reasonably close. Assuming you'll be using C#, I recommend the official MongoDB C# driver and the InsertBatch method with SafeMode turned on. It will literally write data as fast as your file system can handle. A few caveats:
That being said, I'd also recommend looking into RavenDB as well. It supports everything you're looking for but for the life of me, I couldn't get it to perform anywhere close to Mongo.
The only other database that came close to MongoDB was Riak. Its default Bitcask backend is ridiculously fast as long as you have enough memory to store the keyspace but as I recall it doesn't support triggers.
Membase (and the soon-to-be-released Couchbase Server) will easily handle your needs and provide dynamic scalability (on-the-fly add or remove nodes), replication with failover. The memcached caching layer on top will easily handle 200k ops/sec, and you can linearly scale out with multiple nodes to support getting the data persisted to disk.
We've got some recent benchmarks showing extremely low latency (which roughly equates to high throughput): http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html
Don't know how important it is for you to have a supported Enterprise class product with engineering and QA resources behind it, but that's available too.
Edit: Forgot to mention that there is a built-in trigger interface already, and we're extending it even further to track when data hits disk (persisted) or is replicated.
Perry
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With