Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CouchDB map/reduce by any document property at runtime?

I come from a SQL world where lookups are done by several object properties (published = TRUE or user_id = X) and there are no joins anywhere (because of the 1:1 cache layer). It seems that a document database would be a good fit for my data.

I am trying to figure-out if there is a way to pass one (or more) object properties to a CouchDB map/reduce function to find matching documents in a database without creating dozens of views for each document type.

Is it possible to pass the desired document property key(s) to match at run-time to CouchDB and have it return the objects that match (or the count of object that match for pagination)?

For example, on one page I want all posts with a doc.user_id of X that are doc.published. On another page I might want all documents with doc.tags[] with the tag "sport".

like image 645
Xeoncross Avatar asked Feb 24 '23 18:02

Xeoncross


2 Answers

You could build a view that iterates over the keys in the document, and emits a key of [propertyName, propertyValue] - that way you're building a single index with EVERYTHING prop/value in it. Would be massive, no idea how performance would be to build, and disk usage (probably bad).

Map function would look something like:

// note - totally untested, my CouchDB fu is rusty
function(doc) {
  for(prop in doc) {
    emit([prop, doc[prop]], null);
  }
}

Works for the basic case of simple properties, and can be extended to be smart about arrays, and emit a prop/value pair for each item in the array. That would let you handle the tags.

To query on it, set [prop] as your query key on the view.

like image 53
madlep Avatar answered Feb 26 '23 07:02

madlep


Basically, no.

The key difference between something like Couch and a SQL DB is that the only way to query in CouchDB is essentially through the views/indexes. Indexes in SQL are optional. They exist (mostly) to boost performance. For example, if you have a small DB, your app will run just fine on SQL with 0 indexes. (Might be some issue with unique constraints, but that's a detail.)

The overall point being is that part of the query processor in a SQL database includes other methods of data access beyond simply indexes, notably table scans, merge joins, etc.

Couch has no query processor. It has views (defined by JS) used to define B-Tree indexes.

And, that's it. That's the hammer of Couch. It's a good hammer. It's been lasting the data processing world for basically 40 years.

Indexes are somewhat expensive to create in Couch (based on data volume) which is why "temporary views" are frowned upon. And they have a cost in maintenance as well, so views need to be a conscious design element in your database. At the same time, they're a bit more powerful than normal SQL indexes as well.

You can readily add your own query processing on top of Couch, but that will be more work for you. You can create a few select views, on your most popular or selective criteria, and then filter the resulting documents by other criteria in your own code. Yes, you have to do it, so you have to question whether the effort involved is worth more than whatever benefits you feel Couch is offering your (HTTP API, replication, safe, always consistent datastore, etc.) over a SQL solution.

like image 34
Will Hartung Avatar answered Feb 26 '23 08:02

Will Hartung