Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Full and Partial Text Search

Env:

  • MongoDB (3.2.0) with Mongoose

Collection:

  • users

Text Index creation:

  BasicDBObject keys = new BasicDBObject();   keys.put("name","text");    BasicDBObject options = new BasicDBObject();   options.put("name", "userTextSearch");   options.put("unique", Boolean.FALSE);   options.put("background", Boolean.TRUE);      userCollection.createIndex(keys, options); // using MongoTemplate 

Document:

  • {"name":"LEONEL"}

Queries:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)

Any idea why I get 0 results using as query "LEO" or "L"?

Regex with Text Index Search is not allowed.

db.getCollection('users')      .find( { "$text" : { "$search" : "/LEO/i",                            "$caseSensitive": false,                            "$diacriticSensitive": false }} )      .count() // 0 results  db.getCollection('users')      .find( { "$text" : { "$search" : "LEO",                            "$caseSensitive": false,                            "$diacriticSensitive": false }} ) .count() // 0 results 

MongoDB Documentation:

  • Text Search
  • $text
  • Text Indexes
  • Improve Text Indexes to support partial word match
like image 399
Leonel Avatar asked Jun 29 '17 19:06

Leonel


People also ask

Is MongoDB good for full-text search?

While MongoDB's full-text search features may not be as robust as those of some dedicated search engines, they are capable enough for many use cases. Note that there are more search query modifiers — such as case and diacritic sensitivity and support for multiple languages — within a single text index.

How do I do a partial search in MongoDB?

In MongoDB, we can search the text by using the $text query operator and it is used to perform text searches on a collection with a text index. It provides text indexes to support text search queries on string content.

How do I search for text in MongoDB?

Use the $text query operator to perform text searches on a collection with a text index. $text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string.

How does MongoDB text search work?

MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy.


1 Answers

As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

  • Efficient Techniques for Fuzzy and Partial matching in MongoDB by John Page
  • Efficient Partial Keyword Searches by James Tan

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

like image 88
Stennie Avatar answered Oct 24 '22 16:10

Stennie