Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index Bounds on Mongo Regex Search

I'm using MongoDB, and I have a collection of documents with the following structure:

{
    fName:"Foo",
    lName:"Barius",
    email:"[email protected]",
    search:"foo barius"
}

I am building a function that will perform a regular expression search on the search field. To optimize performance, I have indexed this collection on the search field. However, things are still a bit slow. So I ran an explain() on a sample query:

db.Collection.find({search:/bar/}).explain();

Looking under the winning plan, I see the following index bounds used:

"search": [
        "[\"\", {})",
        "[/.*bar.*/, /.*bar.*/]"
]

The second set makes sense - it's looking from anything that contains bar to anything that contains bar. However, the first set baffles me. It appears to be looking in the bounds of "" inclusive to {} exclusive. I'm concerned that this extra set of bounds is slowing down my query. Is it necessary to keep? If it's not, how can I prevent it from being included?

like image 202
vavskjuta Avatar asked Jul 05 '16 18:07

vavskjuta


People also ask

Does regex use index MongoDB?

Index Use. For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.

How do I search for a regular expression in MongoDB?

Regular Expressions are frequently used in all languages to search for a pattern or word in any string. MongoDB also provides functionality of regular expression for string pattern matching using the $regex operator. MongoDB uses PCRE (Perl Compatible Regular Expression) as regular expression language.

Does Mongo $in use index?

MongoDB uses indexing in order to make the query processing more efficient. If there is no indexing, then the MongoDB must scan every document in the collection and retrieve only those documents that match the query.

What is search index in MongoDB?

In the same way, search indexes are a way to find data in an electronic context. Data is stored in different ways in databases, based on the type of database you use. Without an index, the database needs to browse all the records to find a specific record.


2 Answers

I think it's just the way mongodb works with regex (see https://scalegrid.io/blog/mongodb-regular-expressions-indexes-performance/). Just watch out for nscanned/totalKeysExamined value, if it's too large then the index is useless for your query.

See also: MongoDB, performance of query by regular expression on indexed fields

like image 163
blubear Avatar answered Oct 14 '22 21:10

blubear


This is the way mongo works with this type of regex and an index. What I mean is that you are searching for /bar/ instead of /^bar/.

When you specify an index on that field, it is indexing from the first character. So "Foo barius" is indexed beginning with F. Since you are searching for "bar" anywhere in the field you have to search the entire index on that field looking *bar*.

The first line in your explain says look at every record in the index.

The second line say, give me only those indices from (1) that have bar in them.

Bottom line: Design your records so they use the index efficiently. In the case of strings, make sure your searches are at the beginning of the string, e.g., /^bar/. If I'm going to search by last name then it needs to occur first in an indexed field.

As an exercise do an explain on /^bar/ instead. You won't get your data, but the first index bounds will be something like /^bar/ to /^bas/.

I hope my stream of consciousness answer is helpful.

UDude

like image 20
uDude Avatar answered Oct 14 '22 20:10

uDude