I want to find an account by name (in a MongoDB collection of 50K accounts)
In the usual way: we find with string
db.accounts.find({ name: 'Jon Skeet' }) // indexes help improve performance!
How about with regular expression? Is it an expensive operation?
db.accounts.find( { name: /Jon Skeet/ }) // worry! how indexes work with regex?
Edit:
According to WiredPrairie:
MongoDB use prefix of RegEx to lookup indexes (ex: /^prefix.*/
):
db.accounts.find( { name: /^Jon Skeet/ }) // indexes will help!'
MongoDB $regex
Indexes store a small portion of each collection's data set into separate traversable data structures. These indexes then enable your queries to perform at faster speeds by minimizing the number of disk accesses required with each request.
Performance. Because the index contains all fields required by the query, MongoDB can both match the query conditions and return the results using only the index. Querying only the index can be much faster than querying documents outside of the index.
Supported indexing strategies such as compound, unique, array, partial, TTL, geospatial, sparse, hash, wildcard and text ensure optimal performance for multiple query patterns, data types, and application requirements. Indexes are strongly consistent with the underlying data.
Actually according to the documentation,
If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.
http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use
In other words:
For /Jon Skeet/
regex ,mongo will full scan the keys in the index then will fetch the matched documents, which can be faster than collection scan.
For /^Jon Skeet/
regex ,mongo will scan only the range that start with the regex in the index, which will be faster.
In case anyone still has an issue with search performance, there is a way to optimize regex search even if it searches for a word in a sentence (not necessarily at the beginning ^
or the end $
of the string).
The field should have a text index
db.someCollection.createIndex({ someField: "text" })
and the queries on should use regex only after performing a plain search first
db.someCollection.find({ $and: [ { $text: { $search: "someWord" }}, { someField: { $elemMatch: {$regex: /test/ig, $regex: /other/ig}}} ] })
This ensures that the regex will run only for the results of the initial, plain search, which should be quite fast thanks to the index on this field. It might have a huge impact on search performance, depending on how large the collection is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With