Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)

I have a CouchDB (1.1.1) server running that contains a lot of documents in the 400-600KB size range.

If I time fetching a full document from the database (not from a view, just the raw document) it takes 200-400ms to complete which equates to around 1.5MB/s throughput.

If I write the same data to raw files on disk they load in 10-20ms (around 25-50 MB/s).

I'd expect CouchDB to have some overhead, but an order of magnitude (and some) seems crazy for what is essentially a read!

Can anyone shed some light onto why this might be the case?

Update: As requested below, a timing from curl:

# time curl http://localhost:5984/[dbname]/[documentname]

real    0m0.684s
user    0m0.004s
sys     0m0.020s

The fetched document was 642842 bytes. I've tested it on both a standard 1TB harddisk and an EC2 instance (EBS volume) with similar results.

like image 590
Jonathan Williamson Avatar asked Mar 21 '12 15:03

Jonathan Williamson


People also ask

How can I speed up my CouchDB?

Disk and File System Performance. Using faster disks, striped RAID arrays and modern file systems can all speed up your CouchDB deployment.

Is CouchDB slow?

Quite the opposite: CouchDB is slower than many people expect. To some degree it has room to improve and optimize; but primarily CouchDB has decided that those costs are worthwhile for the broader good it brings. CouchDB fails the benchmarks, and aces the college of hard knocks.

Is CouchDB faster than MongoDB?

MongoDB is faster than CouchDB. MongoDB provides faster read speeds. It follows the Map/Reduce query method. It follows Map/Reduce creating a collection and object-based query language.


2 Answers

I think this is a few factors

  1. You are fetching over HTTP which is fundamentally a higher-latency protocol. In particular, you are building up a TCP connection from scratch by using curl. (Web browsers and most client software keeps a pool of persistent, HTTP/1.1 keepalive connections.) But fundamentally, CouchDB chooses a "slower" protocol because it is so universal and so standard.
  2. Your documents are on the larger size for CouchDB. Most documents are single or double-digit KB, not triple. CouchDB is encoding/decoding that JSON in one big gulp (i.e. it is not streaming from the disk.)
  3. Not only is EC2 (even EBS) i/o less-than ideal for a database (it itself has high latency), but it can also fluctuate as your neighbors generate unknown i/o bursts which you compete with.
  4. CouchDB is a filesystem on top of a filesystem. The .couch file looks much like a filesystem itself. So you are multiplying inefficiencies. The .couch file and metadata requires random i/o against the storage; and reading the document requires random i/o within the .couch file. You may see the effects of disk latency multiplied. Instead of comparing reading a document vs. reading a filesystem file, you might compare reading a document vs. reading an equivalent MySQL row.

Note, I am not saying that CouchDB is actually fast and your results are incorrect. Quite the opposite: CouchDB is slower than many people expect. To some degree it has room to improve and optimize; but primarily CouchDB has decided that those costs are worthwhile for the broader good it brings.

CouchDB fails the benchmarks, and aces the college of hard knocks. I suggest that you next benchmark a full load on CouchDB, simulating your expected demand for multiple concurrent access, and get as close as you can to your real-world demands on it. That will be a more helpful test and generally speaking CouchDB performs impressively there.

That said, CouchDB is a domain-specific database and so it may become clear that you are looking for a different tool as well.

like image 126
JasonSmith Avatar answered Oct 04 '22 15:10

JasonSmith


How are you retrieving the document? If you are using some code then please include that code and any libraries that you're using.

Or just use curl to retrieve the document. Ex., I just did time curl http://localhost:5984/bwah/foo and got the document in .017s. An important note is that I'm on a machine with SSDs.

Also, doing one read is not enough to suggest the throughput you can expect from CouchDB, or any server software for that matter. You need to do a lot of requests and then see what the average and median times are.

like image 40
Sam Bisbee Avatar answered Oct 04 '22 13:10

Sam Bisbee