MongoDB version: 3.4.4
Documents in the MongoDB collection were created from the XML files (not GridFS) and look like this one:
{
...
"СвНаимЮЛ" : {
"@attributes" : {
"НаимЮЛПолн" : "ОБЩЕСТВО С ОГРАНИЧЕННОЙ ОТВЕТСТВЕННОСТЬЮ \"КОНСАЛТИНГОВАЯ КОМПАНИЯ \"ГОТЛИБ ЛИМИТИД\"",
...
},
...
}
...
}
Language is Russian. Collection has about 10,000,000 documents and a text index on the field "СвНаимЮЛ.@attributes.НаимЮЛПолн".
Search by one word is very fast:
db.records.find({
$text: {
$search: "ГОТЛИБ"
}
})
But search by several words with logical AND is so slow that I can't even wait until it ends to get explain('executionStats')
results.
E.g. next query is very slow. Find all documents which contain words "ГОТЛИБ" AND "ЛИМИТИД":
db.records.find({
$text: {
$search: "\"ГОТЛИБ\" \"ЛИМИТИД\""
}
})
Search by phrase is also slow. E.g find all documents which contain phrase "ГОТЛИБ ЛИМИТИД":
db.records.find({
$text: {
$search: "\"ГОТЛИБ ЛИМИТИД\""
}
})
getIndexes()
output:
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "egrul.records"
},
...
{
"v" : 2,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "СвНаимЮЛ.@attributes.НаимЮЛПолн_text",
"ns" : "egrul.records",
"default_language" : "russian",
"weights" : {
"СвНаимЮЛ.@attributes.НаимЮЛПолн" : 1
},
"language_override" : "language",
"textIndexVersion" : 3
}
]
Can I somehow increase search-by-several-words (with logical AND) or search-by-phrase speed?
Just found that search by multiple words with logical OR is also slow:
db.records.find({
$text: {
$search: "ГОТЛИБ ЛИМИТИД"
}
})
Looks like the problem is not with slow search-by-multiple-words, but with slow search if search term appears in many documents.
E. g. the word "МИЦУБИСИ" appears only in 24 (from 10,000,000) documents so the query
db.records.find({
$text: {
$search: "МИЦУБИСИ"
}
}).count()
is very fast.
But the word "СЕРВИС" appears in 160,000 documents and the query
db.records.find({
$text: {
$search: "СЕРВИС"
}
}).count()
is very slow (takes about 40 minutes).
Query
db.records.find({
$text: {
$search: "\"МИЦУБИСИ\" \"СЕРВИС\""
}
}).count()
is also slow because (I suppose) MongoDB looks for terms "МИЦУБИСИ" (fast) and "СЕРВИС" (slow) and then make intersection or something.
Now I want to find a way to limit the number of results something like find 10 documents and stop
because limit()
doesn't work with text queries. .
Or maybe upgrade my server hardware.
Or look at the Elasticsearch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With