Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CouchDB view is extremely slow

I have a CouchDB (v0.10.0) database that is 8.2 GB in size and contains 3890000 documents.

Now, I have the following as the Map of the view

function(doc) {emit([doc.Status], doc);

And it takes forever to load (4 hours and still no result).

Here's some extra information that might help describing the situation:

  1. The view is not a temp view. The view is defined before the 3890000 documents are inserted.

  2. There isn't anything on the server. It is a ubuntu box with nothing but the defaults installed.

  3. I see that my CPU is moving and working hard (sometimes shoots to 100%). The memory is moving as well but not increasing.

So my question is:

  1. What is actually happening in the background?
  2. Is this a "one time" thing where I have to wait once and it will somehow works later?
like image 545
Chi Chan Avatar asked Oct 11 '10 20:10

Chi Chan


People also ask

Why is CouchDB so slow?

But fundamentally, CouchDB chooses a "slower" protocol because it is so universal and so standard. Your documents are on the larger size for CouchDB. Most documents are single or double-digit KB, not triple. CouchDB is encoding/decoding that JSON in one big gulp (i.e. it is not streaming from the disk.)

What are views in CouchDB?

Basically views are JavaScript codes which will be put in a document inside the database that they operate on. This special document is called Design document in CouchDB. Each Design document can implement multiple view. Please consult Official CouchDB Design Documents to learn more about how to write view.

How do I check my CouchDB status?

In case you want to check the status of CouchDB, you can do so using the following command: sudo status couchdb.

What is MAP reduce in CouchDB?

In order to retrieve data with CouchDB, we use a process called MapReduce, to create views. A view contains rows of data that is sorted by the row's key (you might use date as a key, for example, to sort your data based on the date). MapReduce is a combination of two concepts Map and Reduce.


1 Answers

Don't emit the whole doc. It's unnecessary. You can instead run your query with include_docs=true, which will let you access the document via each row's doc attribute.

When you emit the whole doc you make the index as large or larger than your entire database. :)

like image 151
mikeal Avatar answered Sep 23 '22 23:09

mikeal