Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to see all the indexed terms in a Mongodb text index?

I'm trying to make my mongodb collection searchable. I'm able to do text search after indexing a collection by text

db.products.createIndex({title: 'text'})

I'm wondering if it's possible to retrieve a list of all the index terms for this collection. This would be very useful for auto completion and spell checking/correction when people are writing their search queries.

like image 665
Viktor Andersen Avatar asked Jan 21 '16 05:01

Viktor Andersen


1 Answers

There is no built in function for this in MongoDB. However, you can easily get this info with an aggregation query.

Let's assume that your collection contains the following documents:

{ "_id" : ObjectId("5874dbb1a1b342232b822827"), "title" : "title" }
{ "_id" : ObjectId("5874dbb8a1b342232b822828"), "title" : "new title" }
{ "_id" : ObjectId("5874dbbea1b342232b822829"), "title" : "hello world" }
{ "_id" : ObjectId("5874dbc6a1b342232b82282a"), "title" : "world title" }
{ "_id" : ObjectId("5874dbcaa1b342232b82282b"), "title" : "world meta" }
{ "_id" : ObjectId("5874dbcea1b342232b82282c"), "title" : "world meta title" }
{ "_id" : ObjectId("5874de7fa1b342232b82282e"), "title" : "something else" }

this query will give us the info on words :

db.products.aggregate([
   {
      $project:{
         words:{
            $split:["$title"," "]
         }
      }
   },
   {
      $unwind:"$words"
   },
   {
      $group:{
         _id:"$words",
         count:{
            $sum:1
         }
      }
   },
   {
      $sort:{
         count:-1
      }
   }
])

This output the number of occurence for each word :

{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "meta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }

If you are using MongoDB 3.4, you can get case insensitive / diacritic insensitive stats on the words with the new collation option.

for example, let's assume that our collection now contains the following documents:

{ "_id" : ObjectId("5874e057a1b342232b82282f"), "title" : "title" }
{ "_id" : ObjectId("5874e05ea1b342232b822830"), "title" : "new Title" }
{ "_id" : ObjectId("5874e067a1b342232b822831"), "title" : "hello world" }
{ "_id" : ObjectId("5874e076a1b342232b822832"), "title" : "World Title" }
{ "_id" : ObjectId("5874e085a1b342232b822833"), "title" : "World méta" }
{ "_id" : ObjectId("5874e08ea1b342232b822834"), "title" : "World meta title" }
{ "_id" : ObjectId("5874e0aea1b342232b822835"), "title" : "something else" }

add the collation option to the aggregation query :

db.products.aggregate([
   {
      $project:{
         words:{
            $split:["$title"," "]
         }
      }
   },
   {
      $unwind:"$words"
   },
   {
      $group:{
         _id:"$words",
         count:{
            $sum:1
         }
      }
   },
   {
      $sort:{
         count:-1
      }
   }
],
{
   collation:{
      locale:"en_US",
      strength:1
   }
})

this will output:

{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "méta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }

The strengh is the level of comparison to perform :

 collation.strength: 1 // case insensitive + diacritic insensitive
 collation.strength: 2 // case insensitive only 
like image 155
felix Avatar answered Nov 15 '22 00:11

felix