Is there any way to see all the indexed terms in a Mongodb text index?

Question

I'm trying to make my mongodb collection searchable. I'm able to do text search after indexing a collection by text

db.products.createIndex({title: 'text'})

I'm wondering if it's possible to retrieve a list of all the index terms for this collection. This would be very useful for auto completion and spell checking/correction when people are writing their search queries.

felix · Accepted Answer

There is no built in function for this in MongoDB. However, you can easily get this info with an aggregation query.

Let's assume that your collection contains the following documents:

{ "_id" : ObjectId("5874dbb1a1b342232b822827"), "title" : "title" }
{ "_id" : ObjectId("5874dbb8a1b342232b822828"), "title" : "new title" }
{ "_id" : ObjectId("5874dbbea1b342232b822829"), "title" : "hello world" }
{ "_id" : ObjectId("5874dbc6a1b342232b82282a"), "title" : "world title" }
{ "_id" : ObjectId("5874dbcaa1b342232b82282b"), "title" : "world meta" }
{ "_id" : ObjectId("5874dbcea1b342232b82282c"), "title" : "world meta title" }
{ "_id" : ObjectId("5874de7fa1b342232b82282e"), "title" : "something else" }

this query will give us the info on words :

db.products.aggregate([
   {
      $project:{
         words:{
            $split:["$title"," "]
         }
      }
   },
   {
      $unwind:"$words"
   },
   {
      $group:{
         _id:"$words",
         count:{
            $sum:1
         }
      }
   },
   {
      $sort:{
         count:-1
      }
   }
])

This output the number of occurence for each word :

{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "meta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }

If you are using MongoDB 3.4, you can get case insensitive / diacritic insensitive stats on the words with the new collation option.

for example, let's assume that our collection now contains the following documents:

{ "_id" : ObjectId("5874e057a1b342232b82282f"), "title" : "title" }
{ "_id" : ObjectId("5874e05ea1b342232b822830"), "title" : "new Title" }
{ "_id" : ObjectId("5874e067a1b342232b822831"), "title" : "hello world" }
{ "_id" : ObjectId("5874e076a1b342232b822832"), "title" : "World Title" }
{ "_id" : ObjectId("5874e085a1b342232b822833"), "title" : "World méta" }
{ "_id" : ObjectId("5874e08ea1b342232b822834"), "title" : "World meta title" }
{ "_id" : ObjectId("5874e0aea1b342232b822835"), "title" : "something else" }

add the collation option to the aggregation query :

db.products.aggregate([
   {
      $project:{
         words:{
            $split:["$title"," "]
         }
      }
   },
   {
      $unwind:"$words"
   },
   {
      $group:{
         _id:"$words",
         count:{
            $sum:1
         }
      }
   },
   {
      $sort:{
         count:-1
      }
   }
],
{
   collation:{
      locale:"en_US",
      strength:1
   }
})

this will output:

{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "méta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }

The strengh is the level of comparison to perform :

 collation.strength: 1 // case insensitive + diacritic insensitive
 collation.strength: 2 // case insensitive only

Is there any way to see all the indexed terms in a Mongodb text index?

Tags:

indexing

search

mongodb

Viktor Andersen

1 Answers

felix

Recent Activity

Donate For Us

Is there any way to see all the indexed terms in a Mongodb text index?

Tags:

indexing

search

mongodb

Viktor Andersen

1 Answers

felix

Related questions

Recent Activity

Donate For Us