Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does MongoDB $text search works?

I have inserted following values in my events collection

db.events.insert(
   [
     { _id: 1, name: "Amusement Ride", description: "Fun" },
     { _id: 2, name: "Walk in Mangroves", description: "Adventure" },
     { _id: 3, name: "Walking in Cypress", description: "Adventure" },
     { _id: 4, name: "Trek at Tikona", description: "Adventure" },
     { _id: 5, name: "Trekking at Tikona", description: "Adventure" }
   ]
)

I've also created a index in a following way:

db.events.createIndex( { name: "text" } )

Now when I execute the following query (Search - Walk):

db.events.find({
    '$text': {
        '$search': 'Walk'
    },
})

I get these results:

{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" }

But when I search Trek:

db.events.find({
    '$text': {
        '$search': 'Trek'
    },
})

I get only one result:

{ _id: 4, name: "Trek at Tikona", description: "Adventure" }

So my question is why it dint resulted:

{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }

When I searched walk it resulted the documents containing both walk and walking. But when I searched for Trek it only resulted the document including trek where it should have resulted both trek and trekking

like image 492
Sushant K Avatar asked Dec 06 '18 13:12

Sushant K


People also ask

How does MongoDB searching work?

Implementing a full-text search engine in MongoDB Atlas is just a question of clicking on a button. Go to any cluster and select the “Search” tab to do so. From there, you can click on “Create Search Index” to launch the process. Once the index is created, you can use the $search operator to perform full-text searches.

What is $text in MongoDB?

$search. string. A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase.

How does MongoDB text index work?

For a text index, the weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score. For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results.

Can MongoDB do full-text search?

MongoDB offers a full-text search solution, MongoDB Atlas Search, for data hosted on MongoDB Atlas.


1 Answers

MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy. The Snowball introduction includes a good overview of some of the limitations of algorithmic stemming.

Your example of walking stems to walk and matches as expected.

However, your example of trekking stems to trekk so does not match your search keyword of trek.

You can confirm this by explaining your query and reviewing the parsedTextQuery information which shows the stemmed search terms used:

db.events.find({$text: {$search: 'Trekking'} }).explain().queryPlanner.winningPlan.parsedTextQuery
{
​   "terms" : [
​       "trekk"
​   ],
​   "negatedTerms" : [ ],
​   "phrases" : [ ],
​   "negatedPhrases" : [ ]
}

You can also check expected Snowball stemming using the online Snowball Demo or by finding a Snowball library for your preferred programming language.

To work around exceptions that might commonly affect your use case, you could consider adding another field to your text index with keywords to influence the search results. For this example, you would add trek as a keyword so that the event described as trekking also matches in your search results.

There are other approaches for more accurate inflection which are generally referred to as lemmatization. Lemmatization algorithms are more complex and start heading into the domain of natural language processing. There are many open source (and commercial) toolkits that you may be able to leverage if you want to implement more advanced text search in your application, but these are outside the current scope of the MongoDB text search feature.

like image 162
Stennie Avatar answered Sep 20 '22 23:09

Stennie