Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create a "like" filter view in CouchDB

Here's an example of what I need in sql:

SELECT name FROM employ WHERE name LIKE %bro%

How do I create view like that in CouchDB?

like image 934
yuda Avatar asked Apr 01 '11 06:04

yuda


People also ask

What are views in CouchDB?

Basically views are JavaScript codes which will be put in a document inside the database that they operate on. This special document is called Design document in CouchDB. Each Design document can implement multiple view. Please consult Official CouchDB Design Documents to learn more about how to write view.

What is Mango in CouchDB?

Mango is a MongoDB inspired query language interface for Apache CouchDB. Mango provides a single HTTP API endpoint that accepts JSON bodies via HTTP POST. These bodies provide a set of instructions that will be handled with the results being returned to the client in the same order as they were specified.


2 Answers

The simple answer is that CouchDB views aren't ideal for this.

The more complicated answer is that this type of query tends to be very inefficient in typical SQL engines too, and so if you grant that there will be tradeoffs with any solution then CouchDB actually has the benefit of letting you choose your tradeoff.

1. The SQL Ways

When you do SELECT ... WHERE name LIKE %bro%, all the SQL engines I'm familiar with must do what's called a "full table scan". This means the server reads every row in the relevant table, and brute force scans the field to see if it matches.

You can do this in CouchDB 2.x with a Mango query using the $regex operator. The query would look something like this for the basic case:

{"selector":{
  "name": {
    "$regex": "bro"
  }
}}

There do not appear to be any options exposed for case-sensitivity, etc. but you could extend it to match only at the beginning/end or more complicated patterns. If you can also restrict your query via some other (indexable) field operator, that would likely help performance. As the documentation warns:

Regular expressions do not work with indexes, so they should not be used to filter large data sets. […]

You can do a full scan in CouchDB 1.x too, using a temporary view:

POST /some_database/_temp_view

{"map": "function (doc) { if (doc.name && doc.name.indexOf('bro') !== -1) emit(null); }"}

This will look through every single document in the database and give you a list of matching documents. You can tweak the map function to also match on a document type, or to emit with a certain key for ordering — emit(doc.timestamp) — or some data value useful to your purpose — emit(null, doc.name).

2. The "tons of disk space available" way

Depending on your source data size you could create an index that emits every possible "interior string" as its permanent (on-disk) view key. That is to say for a name like "Dobros" you would emit("dobros"); emit("obros"); emit("bros"); emit("ros"); emit("os"); emit("s");. Then for a term like '%bro%' you could query your view with startkey="bro"&endkey="bro\uFFFF" to get all occurrences of the lookup term. Your index will be approximately the size of your text content squared, but if you need to do an arbitrary "find in string" faster than the full DB scan above and have the space this might work. You'd be better served by a data structure designed for substring searching though.

Which brings us too...

3. The Full Text Search way

You could use a CouchDB plugin (couchdb-lucene now via Dreyfus/Clouseau for 2.x, ElasticSearch, SQLite's FTS) to generate an auxiliary text-oriented index into your documents.

Note that most full text search indexes don't naturally support arbitrary wildcard prefixes either, likely for similar reasons of space efficiency as we saw above. Usually full text search doesn't imply "brute force binary search", but "word search". YMMV though, take a look around at the options available in your full text engine.

If you don't really need to find "bro" anywhere in a field, you can implement basic "find a word starting with X" search with regular CouchDB views by just splitting on various locale-specific word separators and omitting these "words" as your view keys. This will be more efficient than above, scaling proportionally to the amount of data indexed.

like image 150
natevw Avatar answered Sep 22 '22 18:09

natevw


Unfortunately, doing searches using LIKE %...% aren't really how CouchDB Views work, but you can accomplish a great deal of search capability by installing couchdb-lucene, it's a fulltext search engine that creates indexes on your database that you can do more sophisticated searches with.

The typical way to "search" a database for a given key, without any 3rd party tools, is to create a view that emits the value you are looking for as the key. In your example:

function (doc) {
    emit(doc.name, doc);
}

This outputs a list of all the names in your database.

Now, you would "search" based on the first letters of your key. For example, if you are searching for names that start with "bro".

/db/_design/test/_view/names?startkey="bro"&endkey="brp"

Notice I took the last letter of the search parameter, and "incremented" the last letter in it. Again, if you want to perform searches, rather than aggregating statistics, you should use a fulltext search engine like lucene. (see above)

like image 37
Dominic Barnes Avatar answered Sep 20 '22 18:09

Dominic Barnes