Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)

Tags:

couchdb

I have a CouchDB (1.1.1) server running that contains a lot of documents in the 400-600KB size range.

If I time fetching a full document from the database (not from a view, just the raw document) it takes 200-400ms to complete which equates to around 1.5MB/s throughput.

If I write the same data to raw files on disk they load in 10-20ms (around 25-50 MB/s).

I'd expect CouchDB to have some overhead, but an order of magnitude (and some) seems crazy for what is essentially a read!

Can anyone shed some light onto why this might be the case?

Update: As requested below, a timing from curl:

# time curl http://localhost:5984/[dbname]/[documentname]

real    0m0.684s
user    0m0.004s
sys     0m0.020s

The fetched document was 642842 bytes. I've tested it on both a standard 1TB harddisk and an EC2 instance (EBS volume) with similar results.

590

asked Mar 21 '12 15:03

2 Answers

I think this is a few factors

You are fetching over HTTP which is fundamentally a higher-latency protocol. In particular, you are building up a TCP connection from scratch by using curl. (Web browsers and most client software keeps a pool of persistent, HTTP/1.1 keepalive connections.) But fundamentally, CouchDB chooses a "slower" protocol because it is so universal and so standard.
Your documents are on the larger size for CouchDB. Most documents are single or double-digit KB, not triple. CouchDB is encoding/decoding that JSON in one big gulp (i.e. it is not streaming from the disk.)
Not only is EC2 (even EBS) i/o less-than ideal for a database (it itself has high latency), but it can also fluctuate as your neighbors generate unknown i/o bursts which you compete with.
CouchDB is a filesystem on top of a filesystem. The .couch file looks much like a filesystem itself. So you are multiplying inefficiencies. The .couch file and metadata requires random i/o against the storage; and reading the document requires random i/o within the .couch file. You may see the effects of disk latency multiplied. Instead of comparing reading a document vs. reading a filesystem file, you might compare reading a document vs. reading an equivalent MySQL row.

Note, I am not saying that CouchDB is actually fast and your results are incorrect. Quite the opposite: CouchDB is slower than many people expect. To some degree it has room to improve and optimize; but primarily CouchDB has decided that those costs are worthwhile for the broader good it brings.

CouchDB fails the benchmarks, and aces the college of hard knocks. I suggest that you next benchmark a full load on CouchDB, simulating your expected demand for multiple concurrent access, and get as close as you can to your real-world demands on it. That will be a more helpful test and generally speaking CouchDB performs impressively there.

That said, CouchDB is a domain-specific database and so it may become clear that you are looking for a different tool as well.

126

answered Oct 04 '22 15:10

JasonSmith

How are you retrieving the document? If you are using some code then please include that code and any libraries that you're using.

Or just use curl to retrieve the document. Ex., I just did time curl http://localhost:5984/bwah/foo and got the document in .017s. An important note is that I'm on a machine with SSDs.

Also, doing one read is not enough to suggest the throughput you can expect from CouchDB, or any server software for that matter. You need to do a lot of requests and then see what the average and median times are.

answered Oct 04 '22 13:10

Sam Bisbee

Related questions
                            
                                Why is accessing dimensions of the image so expensive in JavaScript on IE8?
                            
                                Squeezing performance out of v8
                            
                                Removing rows with duplicates in a NumPy array
                            
                                Full GC becoming very frequent
                            
                                most simple way to start a new process/thread in PHP
                            
                                Echoing content sometimes takes a very long time
                            
                                Regex taking surprisingly long time
                            
                                Big-Oh Performance of an Inner Join on Two Indexes
                            
                                Gray code increment function
                            
                                Numpy item faster than operator[]
                            
                                Android: WebView improve loading speed of local html Files
                            
                                Slow SecureRandom initialization
                            
                                Good Resources for Relational Database Design [closed]
                            
                                Comparing cold-start to warm start
                            
                                Efficient algorithm for comparing XML nodes
                            
                                BDE vs ADO in Delphi
                            
                                Is reducing number of cpp translation units a good idea?
                            
                                Best ways to reduce number of DOM elements
                            
                                What's the difference between encapsulating a private member as a property and defining a property without a private member?
                            
                                Do images w/ display:none still make an HTTP request?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are reads from CouchDB so slow? (1.5MB/s or thereabouts)

Tags:

performance

couchdb

Jonathan Williamson

People also ask

2 Answers

JasonSmith

Sam Bisbee

Recent Activity

Donate For Us