I'm trying to make my mongodb collection searchable. I'm able to do text search after indexing a collection by text
db.products.createIndex({title: 'text'})
I'm wondering if it's possible to retrieve a list of all the index terms for this collection. This would be very useful for auto completion and spell checking/correction when people are writing their search queries.
There is no built in function for this in MongoDB. However, you can easily get this info with an aggregation query.
Let's assume that your collection contains the following documents:
{ "_id" : ObjectId("5874dbb1a1b342232b822827"), "title" : "title" }
{ "_id" : ObjectId("5874dbb8a1b342232b822828"), "title" : "new title" }
{ "_id" : ObjectId("5874dbbea1b342232b822829"), "title" : "hello world" }
{ "_id" : ObjectId("5874dbc6a1b342232b82282a"), "title" : "world title" }
{ "_id" : ObjectId("5874dbcaa1b342232b82282b"), "title" : "world meta" }
{ "_id" : ObjectId("5874dbcea1b342232b82282c"), "title" : "world meta title" }
{ "_id" : ObjectId("5874de7fa1b342232b82282e"), "title" : "something else" }
this query will give us the info on words :
db.products.aggregate([
{
$project:{
words:{
$split:["$title"," "]
}
}
},
{
$unwind:"$words"
},
{
$group:{
_id:"$words",
count:{
$sum:1
}
}
},
{
$sort:{
count:-1
}
}
])
This output the number of occurence for each word :
{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "meta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }
If you are using MongoDB 3.4, you can get case insensitive / diacritic insensitive stats on the words with the new collation option.
for example, let's assume that our collection now contains the following documents:
{ "_id" : ObjectId("5874e057a1b342232b82282f"), "title" : "title" }
{ "_id" : ObjectId("5874e05ea1b342232b822830"), "title" : "new Title" }
{ "_id" : ObjectId("5874e067a1b342232b822831"), "title" : "hello world" }
{ "_id" : ObjectId("5874e076a1b342232b822832"), "title" : "World Title" }
{ "_id" : ObjectId("5874e085a1b342232b822833"), "title" : "World méta" }
{ "_id" : ObjectId("5874e08ea1b342232b822834"), "title" : "World meta title" }
{ "_id" : ObjectId("5874e0aea1b342232b822835"), "title" : "something else" }
add the collation option to the aggregation query :
db.products.aggregate([
{
$project:{
words:{
$split:["$title"," "]
}
}
},
{
$unwind:"$words"
},
{
$group:{
_id:"$words",
count:{
$sum:1
}
}
},
{
$sort:{
count:-1
}
}
],
{
collation:{
locale:"en_US",
strength:1
}
})
this will output:
{ "_id" : "title", "count" : 4 }
{ "_id" : "world", "count" : 4 }
{ "_id" : "méta", "count" : 2 }
{ "_id" : "else", "count" : 1 }
{ "_id" : "something", "count" : 1 }
{ "_id" : "new", "count" : 1 }
{ "_id" : "hello", "count" : 1 }
The strengh is the level of comparison to perform :
collation.strength: 1 // case insensitive + diacritic insensitive
collation.strength: 2 // case insensitive only
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With