Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can $in / $or queries use indexes?

I'm playing with Mango queries on a CouchDB 2.0 instance, through the fantastic pouchdb-find.

A few times I got the dreaded no matching index found, create an index to optimize query time warning even though I was using indexed fields.

Just now I got it when selecting "type": {"$in": ["a", "b"]} or the equivalent "$or": [{"type": "a"}, {"type": "b"}] , even though an index on type exists.

Googling (cloudant query docs, pouchdb-find docs, SO question) didn't help, and in the latter @nlawson says that some predicates ($ne in the aforementioned question, but maybe my $in / $or fall into the same basket?) "currently do not use any index".

  • If I'm indeed in the same boat, what does that mean? Is the impossibility to use indexes on queries using certain predicates a limitation of the mango backend, or a pouchdb one?
  • Am I doing something wrong / is there an index workaround to avoid this?
    • More generally, is there documentation I could read to get a deeper grip on how indexes work and how to troubleshoot them?

Thanks!

like image 297
Ronan Jouchet Avatar asked Nov 18 '16 21:11

Ronan Jouchet


Video Answer


1 Answers

Answering my question: no, $in/or queries cannot use indices. I asked the question in this user@couchdb mailing list thread, where Garren Smith answered and pointed to Understanding Mango View-Based Indexes vs. Search-Based Indexes and A look under the covers of PouchDB-find. To quote Garren,

The reason that adding "_id": {"$gt": 0} works is because pouchdb-find/mango fetches all the docs using the _all_docs index and then processes the $in operator in memory.

If you have a large database this will hurt. But you can use a better value than 0 to reduce the number of documents that need to be sorted in memory, which is a good thing.

So, careful, "_id": {"$gt": 0}is by no means a way to use an index (that's the impression I got when @markwatsonatx suggested it), it's only a way to suppress pouchdb-find's warning, by telling it "I know this won't fit a map/reduce, I'll be running in-memory operations on allDocs, and I'm aware of the perf. consequences". Also,

The warning is just to help anyone new to using Mango that what they are doing isn't the best way on a large database but will be fine on a small database. It's a fine way to experiment but once you start noticing performance issues, creating an index is the way forward.

I'll complement with a little benchmark I made, comparing different approaches to fetch {10, 100, 1000, 10000} "cases" (using selectors on an indexed field) from a db containing [10000 cases, 100000 noise documents]

|number of cases fetched|10 |100 |1000 |10000 | |-----------------------|------|-------|--------|------| |$in |2452ms|2539ms |2474ms |5032ms| |$in + $gt |905ms |784ms |1014ms |3805ms| |$in + $gt + $lt |5ms |13ms |100ms |3854ms| |$or |2638ms|11763ms|101279ms|- |

like image 123
Ronan Jouchet Avatar answered Sep 19 '22 13:09

Ronan Jouchet