I have a CouchDB (1.1.1) server running that contains a lot of documents in the 400-600KB size range.
If I time fetching a full document from the database (not from a view, just the raw document) it takes 200-400ms to complete which equates to around 1.5MB/s throughput.
If I write the same data to raw files on disk they load in 10-20ms (around 25-50 MB/s).
I'd expect CouchDB to have some overhead, but an order of magnitude (and some) seems crazy for what is essentially a read!
Can anyone shed some light onto why this might be the case?
Update: As requested below, a timing from curl:
# time curl http://localhost:5984/[dbname]/[documentname]
real 0m0.684s
user 0m0.004s
sys 0m0.020s
The fetched document was 642842 bytes. I've tested it on both a standard 1TB harddisk and an EC2 instance (EBS volume) with similar results.
Disk and File System Performance. Using faster disks, striped RAID arrays and modern file systems can all speed up your CouchDB deployment.
Quite the opposite: CouchDB is slower than many people expect. To some degree it has room to improve and optimize; but primarily CouchDB has decided that those costs are worthwhile for the broader good it brings. CouchDB fails the benchmarks, and aces the college of hard knocks.
MongoDB is faster than CouchDB. MongoDB provides faster read speeds. It follows the Map/Reduce query method. It follows Map/Reduce creating a collection and object-based query language.
I think this is a few factors
curl
. (Web browsers and most client software keeps a pool of persistent, HTTP/1.1 keepalive connections.) But fundamentally, CouchDB chooses a "slower" protocol because it is so universal and so standard..couch
file. You may see the effects of disk latency multiplied. Instead of comparing reading a document vs. reading a filesystem file, you might compare reading a document vs. reading an equivalent MySQL row.Note, I am not saying that CouchDB is actually fast and your results are incorrect. Quite the opposite: CouchDB is slower than many people expect. To some degree it has room to improve and optimize; but primarily CouchDB has decided that those costs are worthwhile for the broader good it brings.
CouchDB fails the benchmarks, and aces the college of hard knocks. I suggest that you next benchmark a full load on CouchDB, simulating your expected demand for multiple concurrent access, and get as close as you can to your real-world demands on it. That will be a more helpful test and generally speaking CouchDB performs impressively there.
That said, CouchDB is a domain-specific database and so it may become clear that you are looking for a different tool as well.
How are you retrieving the document? If you are using some code then please include that code and any libraries that you're using.
Or just use curl
to retrieve the document. Ex., I just did time curl http://localhost:5984/bwah/foo
and got the document in .017s. An important note is that I'm on a machine with SSDs.
Also, doing one read is not enough to suggest the throughput you can expect from CouchDB, or any server software for that matter. You need to do a lot of requests and then see what the average and median times are.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With