Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB regex matching trouble

Here is my MongoDB shell session;

> db.foo.save({path: 'a:b'})
WriteResult({ "nInserted" : 1 })

> db.foo.findOne()
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }

> db.foo.save({path: 'a:b:c'})
WriteResult({ "nInserted" : 1 })

> db.foo.find({path: /a:[^:]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }

> db.foo.find({path: /a:[a-z]+/})
{ "_id" : ObjectId("58fedc47622e89329d123ee8"), "path" : "a:b" }
{ "_id" : ObjectId("58fedc57622e89329d123ee9"), "path" : "a:b:c" }

Clearly the regex /a:[^:]+/ and /a:[a-z]+/ shouldn't match string 'a:b:c', but looks like Mongo failed on this regex, does anyone know what happened here?

It was submitted to MongoDB Jira, as a bug ticket, so is it a bug within MongoDB querying structure?

like image 603
Gelin Luo Avatar asked Apr 25 '17 05:04

Gelin Luo


People also ask

Does MongoDB support regex?

MongoDB provides the functionality to search a pattern in a string during a query by writing a regular expression. A regular expression is a generalized way to match patterns with sequences of characters. MongoDB uses Perl compatible regular expressions(PCRE) version 8.42 along with UTF-8 support.

How do I search for a regular expression in MongoDB?

MongoDB also provides functionality of regular expression for string pattern matching using the $regex operator. MongoDB uses PCRE (Perl Compatible Regular Expression) as regular expression language. Unlike text search, we do not need to do any configuration or command to use regular expressions.

How do I use wildcard search in MongoDB?

Create a Wildcard Index on All Fields With this wildcard index, MongoDB indexes all fields for each document in the collection. If a given field is a nested document or array, the wildcard index recurses into the document/array and stores the value for all fields in the document/array.


1 Answers

The trouble is with the partial matching, since you are not restricting the regex for the whole word, the partial match that exists in a:b:c that is a:b is resulting in you getting that document.

Use the following regex with ^$ that are anchors to represent beginning and the end of the word;

db.foo.find({path: /^a:[^:]+$/})
db.foo.find({path: /^a:[a-z]+$/})

This will make the regex apply for the whole string, and ignore the partial matches as explained above. For more on regex anchors, click here.

So, in summary, there is no bug, just a misuse of regex.

like image 70
buræquete Avatar answered Sep 19 '22 01:09

buræquete