Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB : text index with arrays, only first term is indexed

I have a document that has the following Schema

{
  description : String,
  tags : [String]
}

I have indexed both fields as text, but the problem is that whenever I search for a specific string within the array, it will return the document only if the string is the first element of the array. Therefore it seems that the $text index only works for the first element, is this how mongo inherently works or is there an option that must be passed to the index?

Example document

{
   description : 'random description',
   tags : ["hello", "there"]
}

The object that created the index

{description : 'text', tags : 'text'}

The query

db.myCollection.find({$text : {$search : 'hello'}});

returns a document but

db.myCollection.find({$text : {$search : 'there'}});

does not return anything.

using version 2.6.11

I have other indexes but these are the only text indexes. Here is the corresponding output of db.myCollection.getIndexes()

{
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "description_text_tags_text",
                "ns" : "myDB.myCollection",
                "weights" : {
                        "description" : 1,
                        "tags" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 2
        },
like image 918
naughty boy Avatar asked Oct 18 '22 19:10

naughty boy


1 Answers

This has nothing to do with the string being first element or second element of the array. The word "there" is in the stop-words list of "english" language and is not added to the index at all. The text indexing process involves stemming and removal of the stop words from the text, before the terms gets added to the text index and these processes are language dependent.

You may like to create the text index as:

db.myCollection.ensureIndex({description : 'text', tags : 'text'}, { default_language: "none" }) 

If "none" is used as the default language, then text indexing process will do simple tokenization and will not use any stop words list. By default, "english" is used as the "default_language" for the text index.

like image 141
Nipun Talukdar Avatar answered Oct 21 '22 23:10

Nipun Talukdar