Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB, performance of query by regular expression on indexed fields

I want to find an account by name (in a MongoDB collection of 50K accounts)

In the usual way: we find with string

db.accounts.find({ name: 'Jon Skeet' })  // indexes help improve performance! 

How about with regular expression? Is it an expensive operation?

db.accounts.find( { name: /Jon Skeet/ }) // worry! how indexes work with regex? 

Edit:

According to WiredPrairie:
MongoDB use prefix of RegEx to lookup indexes (ex: /^prefix.*/):

db.accounts.find( { name: /^Jon Skeet/ })  // indexes will help!' 

MongoDB $regex

like image 836
damphat Avatar asked Jul 06 '13 10:07

damphat


People also ask

How can indexes speed up queries in MongoDB?

Indexes store a small portion of each collection's data set into separate traversable data structures. These indexes then enable your queries to perform at faster speeds by minimizing the number of disk accesses required with each request.

Can we run query efficiently in MongoDB?

Performance. Because the index contains all fields required by the query, MongoDB can both match the query conditions and return the results using only the index. Querying only the index can be much faster than querying documents outside of the index.

Are MongoDB indexes strongly consistent?

Supported indexing strategies such as compound, unique, array, partial, TTL, geospatial, sparse, hash, wildcard and text ensure optimal performance for multiple query patterns, data types, and application requirements. Indexes are strongly consistent with the underlying data.


2 Answers

Actually according to the documentation,

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use

In other words:

For /Jon Skeet/ regex ,mongo will full scan the keys in the index then will fetch the matched documents, which can be faster than collection scan.

For /^Jon Skeet/ regex ,mongo will scan only the range that start with the regex in the index, which will be faster.

like image 200
m_elsayed Avatar answered Oct 02 '22 10:10

m_elsayed


In case anyone still has an issue with search performance, there is a way to optimize regex search even if it searches for a word in a sentence (not necessarily at the beginning ^ or the end $ of the string).

The field should have a text index

db.someCollection.createIndex({ someField: "text" }) 

and the queries on should use regex only after performing a plain search first

db.someCollection.find({ $and:    [     { $text: { $search: "someWord" }},      { someField: { $elemMatch: {$regex: /test/ig, $regex: /other/ig}}}   ] }) 

This ensures that the regex will run only for the results of the initial, plain search, which should be quite fast thanks to the index on this field. It might have a huge impact on search performance, depending on how large the collection is.

like image 33
Sebastian Avatar answered Oct 02 '22 12:10

Sebastian