Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB text index search by multiple words is too slow

Tags:

mongodb

Problem Description

MongoDB version: 3.4.4

Documents in the MongoDB collection were created from the XML files (not GridFS) and look like this one:

{
    ...
    "СвНаимЮЛ" : {
        "@attributes" : {
            "НаимЮЛПолн" : "ОБЩЕСТВО С ОГРАНИЧЕННОЙ ОТВЕТСТВЕННОСТЬЮ \"КОНСАЛТИНГОВАЯ КОМПАНИЯ \"ГОТЛИБ ЛИМИТИД\"",
            ...
        },
        ...
    }
    ...
}

Language is Russian. Collection has about 10,000,000 documents and a text index on the field "СвНаимЮЛ.@attributes.НаимЮЛПолн".

Search by one word is very fast:

db.records.find({
    $text: {
        $search: "ГОТЛИБ"
    }
})

But search by several words with logical AND is so slow that I can't even wait until it ends to get explain('executionStats') results.

E.g. next query is very slow. Find all documents which contain words "ГОТЛИБ" AND "ЛИМИТИД":

db.records.find({
    $text: {
        $search: "\"ГОТЛИБ\" \"ЛИМИТИД\""
    }
})

Search by phrase is also slow. E.g find all documents which contain phrase "ГОТЛИБ ЛИМИТИД":

db.records.find({
    $text: {
        $search: "\"ГОТЛИБ ЛИМИТИД\""
    }
})

getIndexes() output:

[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "egrul.records"
        },

        ...

        {
                "v" : 2,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "СвНаимЮЛ.@attributes.НаимЮЛПолн_text",
                "ns" : "egrul.records",
                "default_language" : "russian",
                "weights" : {
                        "СвНаимЮЛ.@attributes.НаимЮЛПолн" : 1
                },
                "language_override" : "language",
                "textIndexVersion" : 3
        }
]

Question

Can I somehow increase search-by-several-words (with logical AND) or search-by-phrase speed?

Edited

Just found that search by multiple words with logical OR is also slow:

db.records.find({
    $text: {
        $search: "ГОТЛИБ ЛИМИТИД"
    }
})
like image 601
peresmeshnik Avatar asked Oct 30 '22 06:10

peresmeshnik


1 Answers

Looks like the problem is not with slow search-by-multiple-words, but with slow search if search term appears in many documents.

E. g. the word "МИЦУБИСИ" appears only in 24 (from 10,000,000) documents so the query

db.records.find({
    $text: {
        $search: "МИЦУБИСИ"
    }
}).count()

is very fast.

But the word "СЕРВИС" appears in 160,000 documents and the query

db.records.find({
    $text: {
        $search: "СЕРВИС"
    }
}).count()

is very slow (takes about 40 minutes).

Query

db.records.find({
    $text: {
        $search: "\"МИЦУБИСИ\" \"СЕРВИС\""
    }
}).count()

is also slow because (I suppose) MongoDB looks for terms "МИЦУБИСИ" (fast) and "СЕРВИС" (slow) and then make intersection or something.

Now I want to find a way to limit the number of results something like find 10 documents and stop because limit() doesn't work with text queries. .

Or maybe upgrade my server hardware.

Or look at the Elasticsearch.

like image 124
peresmeshnik Avatar answered Nov 15 '22 10:11

peresmeshnik