Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo $text query: return docs "starting with" string before others

Let's say I have a mongo collection with a text index on the itemName field with these 3 documents:

{
    _id: ...,
    itemName: 'Mashed carrots with big carrot pieces',
    price: 1.29
},
{
    _id: ...,
    itemName: 'Carrot juice',
    price: 0.79
},
{
    _id: ...,
    itemName: 'Apple juice',
    price: 1.49
}

I then exec a query like so:

db.items.find({ $text: { $search: 'Car' } }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

How do I force mongo to return documents beginning with "Car" (case insensitive) before returning any other docs also containing "Car" somewhere in the itemName string?

So I want to retrieve the docs in the following order:

[
    {..., itemName: 'Carrot Juice', ...},
    {..., itemName: 'Mashed carrots with big carrot pieces', ...}
]

Of course this is meant to be used in a search functionality, so it makes total sense to show the user the items starting with his search string before showing any other items after that.

Until now I was using standard regex, but the performance here is of course much worse! + since I have to search case insensitive, according to the docs, normal regex is not using any indices at all?!

EDIT:

Also, sometimes the behavior of $text is very weird. For example I have about 10-15 items where itemName begins with the word "Zwiebel". This query

db.items.find({ $text: { $search: "Zwiebel" }, supplier_id: 'iNTJHEf5YgBPicTrJ' }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

works like a charm and returns all those documents, while this query

db.items.find({ $text: { $search: "Zwie" }, supplier_id: 'iNTJHEf5YgBPicTrJ' }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

does not return anything! Only by changing "Zwiebel" to "Zwie" in the $search.

I really do not understand how this is possible?!

best, P

like image 955
Patrick DaVader Avatar asked Feb 26 '17 12:02

Patrick DaVader


People also ask

Does MongoDB return documents in order?

MongoDB does not store documents in a collection in a particular order. When sorting on a field which contains duplicate values, documents containing those values may be returned in any order.

How do I skip the first 5 documents in MongoDB?

To skip records in MongoDB, use skip(). With that, to display only a specific number of records, use limit().

How do I make a case insensitive query in MongoDB?

The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.

Does query order matter MongoDB?

Generally, the order of query operators doesn't matter...


1 Answers

A solution is to use the $indexOfCP operator introcuced in MongoDB 3.4

This operator return the index of the occurence of a String in an other String, and -1 if there is no occurence

how it works:

  1. filter out all documents not containing 'car' with a regex: /car/gi (case inensitive)
  2. create a field named index which stores the index of 'car' in itemName
  3. sort documents on the index field

the query would looks like this:

db.items.aggregate([
   {
      $match:{
         itemName:/car/gi
      }
   },
   {
      $project:{
         index:{
            $indexOfCP:[
               {
                  $toLower:"$itemName"
               },
               "car"
            ]
         },
         price:1,
         itemName:1
      }
   },
   {
      $sort:{
         index:1
      }
   }
])

and this returns:

{ "_id" : 2, "itemName" : "Carrot juice", "price" : 0.79, "index" : 0 }
{ "_id" : 1, "itemName" : "Mashed carrots with big carrot pieces", "price" : 1.29, "index" : 7 }

try it online: mongoplayground.net/p/FqqCUQI3D-E

Edit:

For the behavior of the $text index, this is completely normal

A text index tokenize text using delimiters (default delimiters are white space and punctuation). It can only be used to search for whole worlds, and so it won't work for subpart of words

from mongodb text index documentation

$text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string.

like image 200
felix Avatar answered Oct 06 '22 09:10

felix