Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB search and pagination Aggregation Performance issue

I'm new in node js and MongoDB. I'm working on MongoDB search and pagination which is working good, but I have an issue with performance. it is taking too much time in counting and search records.

if I use small word to search then it works faster, if I use "long string" or "no record in database" then it takes too much time which is 50 to 186.30 seconds. (it is too much time, I'm expecting it to be 1 to 2 seconds).

I have more than 15,00,000 data on my record.

If I do not include count of the search word. it is takes 0.20 to 1.5 seconds, but when I count records while searching word it takes 25.0 to 35.0 seconds.

I have no idea how to decrease this time for counting records with the search word(query optimization).

I tried max level of query optimization.

I have also tried with

{
  $count: "passing_scores"
}

but no change on time. I'm stuck on it. I have to decrease the time of count with the search word.

SQL Query for example

  SELECT * FROM `post`
    Left JOIN catagory ON post.catid=catagory.id
    WHERE post_name LIKE '%a%' OR post_data LIKE '%a%' OR tags LIKE '%a%' OR post_url LIKE '%a%'

NODE and MongoDB

PostObj.count({},function(err,totalCount) {
        if(err) {
            response = {"error" : true,"message" : "Error fetching data"}
        }
        PostObj.aggregate([
        { $lookup:
                {
                    from: 'catagories',
                    localField: 'catagory.catagory_id',
                    foreignField: '_id',
                    as: 'catagories_data'
                }
        },
        {

            $match:
                {
                    $or: [
                        {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                        {"postname": { $regex: new RegExp(search_data) }},
                        {"posturl": { $regex: new RegExp(search_data) }},
                        {"postdata": { $regex: new RegExp(search_data) }},
                        {"tags": { $regex: new RegExp(search_data) }}
                    ]
                }
        },            
        { $limit : search_limit },
        { $skip : search_skip },
        { $group : { _id : "$_id", postname: { $push: "$postname" } , posturl: { $push: "$posturl" }  } } 
    ]).exec(function (err, data){  

        //end insert log data        
        if(err) {
            response = {"error" : true,"message" :err};
        } 

        if(search_data != "")
        {
            // count record using search word

            PostObj.aggregate([
                    { $lookup:
                        {
                            from: 'catagories',
                            localField: 'catagory.catagory_id',
                            foreignField: '_id',
                            as: 'catagories_data'
                        }
                },
                {

                    $match:
                        {
                            $or: [
                                {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                                {"postname": { $regex: new RegExp(search_data) }},
                                {"posturl": { $regex: new RegExp(search_data) }},
                                {"postdata": { $regex: new RegExp(search_data) }},
                                {"tags": { $regex: new RegExp(search_data) }}
                            ]
                        }
                },    
                { $group: { _id: null, myCount: { $sum: 1 } } },
                { $project: { _id: 0 } }   
            ]).exec(function (err, Countdata){  
                res.json({
                sEcho : req.body.draw,
                iTotalRecords: Countdata.myCount,
                iTotalDispla,yRecords: Countdata.myCount,
                aaData: data
            });
        }

        res.json({
            sEcho : req.body.draw,
            iTotalRecords: totalPages,
            iTotalDisplayRecords: totalPages,
            aaData: data
        });
    });
});

Also, I have to try this way but it is tack 35.0 to 49.0 seconds more than 1st code.

PostObj.aggregate([
    { $lookup:
               {
                            from: 'catagories',
                            localField: 'catagory.catagory_id',
                            foreignField: '_id',
                            as: 'catagories_data'
                        }
                },
                {

                    $match:
                        {
                            $or: [
                                {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                                {"postname": { $regex: new RegExp(search_data) }},
                                {"posturl": { $regex: new RegExp(search_data) }},
                                {"postdata": { $regex: new RegExp(search_data) }},
                                {"tags": { $regex: new RegExp(search_data) }}
                            ]
                        }
                }, 
    { '$facet'    : {
        metadata: [ { $count: "total" }, { $addFields: { page: NumberInt(3) } } ],
        data: [ { $skip: 20 }, { $limit: 10 } ] // add projection here wish you re-shape the docs
    } }
] )

If I do not use search word it is work good. I have an issue with when searching any word(count of records of that work without skip and limit)

collection data

Post

 {
   "_id": ObjectId("5d29bd7609f28633f38ccc13"),
   "postname": "this is some data ",
   "tags " : "
   Damita,
   Caro,
   Leontyne,
   Theodosia,
   Vyky ",
   "postdata ": "Berry Samara Kellia Rebekah Linette Hyacinthie Joelly Micky Tomasina Christian Fae Doralynn Chelsea Aurie Gwendolyn Tate
   Cairistiona Ardys Aubrie Damita Olga Kelli Leone Marthena Kelcy
   Cherlyn Molli Pris Ginelle Sula Johannah Hedwig Adelle Editha Lindsey
   Loleta Lenette Ann Heidie Drona Charlena Emilia Manya Ketti Dorthea
   Jeni Lorene Eolanda Karoly Loretta Marylou Tommie Leontyne Winny Cyb
   Violet Pavia Karen Idelle Betty Doloritas Judye Aretha Quinta Billie
   Vallie Fiona Letty Gates Shandra Rosemary Dorice Doro Coral Tove Crin
   Bobbe Kristan Tierney Gianina Val Daniela Kellyann Marybeth Konstance
   Nixie Andeee Jolene Patrizia Carla Arabella Berna Roseline Lira Cristy
   Hedi Clem Nerissa ",
   "catagory " : [
     { "catagory_id " : [ ObjectId("5d29bd7509f28633f38ccbfd")]},
     { "catagory_id": [ ObjectId("5d29bd7509f28633f38ccbfd") ]}],
   "createby": "5d22f712fe481b2a9afda4aa"
 }

catagory

{
  "_id": ObjectId("5d29bc271a68fb333531f6a1"),
  "catagory_name": "Katharine",
  "catagory_description": "Katharine"
}

Any solution for it?

like image 872
JON Avatar asked Jul 17 '19 05:07

JON


People also ask

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.

Is aggregation faster than find?

Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort. A well indexed(Indexing that suits your query) data will always yield faster results on your find query.

Is MongoDB good for aggregate?

In MongoDB, aggregation operations process the data records/documents and return computed results. It collects values from various documents and groups them together and then performs different types of operations on that grouped data like sum, average, minimum, maximum, etc to return a computed result.

Is lookup slow in MongoDB?

It is slow because it is not using an index. For each document in the logs collection, it is doing a full collection scan on the graphs collection.


2 Answers

If in your case, your regex is just looking for a (or few) word(s), then it would be better to use $text instead of $regex. $text can use text index and is thus much faster. In terms of MySQL, $text is LIKE and $regex is REGEXP. Since in your example mysql query you are using LIKE, I'm pretty confident you can go for $text instead of $regex, in your mongo query as well.

You need to have (if not already) a compound "text" index on your fields - (postname, tags, postdata and posturl).

db.POST.createIndex(
   {
     postname: "text",
     tags: "text",
     posturl: "text",
     postdata: "text"
   }
 )
like image 147
Rajat Goel Avatar answered Nov 11 '22 00:11

Rajat Goel


There are some tips that i can suggest you try.

1: POST collection

it seems you are storing only category_id inside your category array of objects property, which you should avoid. instead what you should do is as below.

create new property post_id inside category collection instead of array of object of category in post collection in [ high performance approach ].

OR

convert category property of post collection form array of object to simple array. [ average performance ]. Ex: category: [ ObjectId("5d29bd7509f28633f38ccbfd", ObjectId("5d29bd7509f28633f38ccbfd", ObjectId("5d29bd7509f28633f38ccbfd"];

definitely in both the cases post_id or category property must be indexed.

2: lookup

instead using simple lookup pipeline you should use pipeline approach

Eg:

NOT GOOD.

$lookup:{
    from: 'catagories',
    localField: 'catagory.catagory_id', // BAD IDEA //
    foreignField: '_id',
    as: 'catagories_data'
},

GOOD.

$lookup:{
    from: 'catagories',
    localField: '_id',
    foreignField: 'post_id',  // GOOD IDEA
    as: 'catagories_data'
},

EVEN BETTER


$lookup:{
    let : { post_id: "$_id" },
    from: 'catagories',
    pipeline:[
              {
                    $match: {
                        $expr: {
                            $and: [
                                { $eq: ["$post_id", "$$post_id"], },
                            ]
                        }
                    },
                },
                {
                    $match: {
                        $or: [

                            // AVOID `new` keyword if you can do such;
                            // and create indexes for the same;

                            { "catagory_name": { $regex: `^${search_data}` } },
                            { "postname": { $regex: `^${search_data}` } },
                            { "posturl": { $regex: `^${search_data}` } },
                            { "postdata": { $regex: `^${search_data}` } },
                            { "tags": { $regex: `^${search_data}` } }
                        ]
                    }

                }
    ],
    as: 'catagories_data'
},

After All facet pipeline seems fine to me.

'$facet' : {
    metadata: [ { $count: "total" }, { $addFields: { page: NumberInt(3) } } ],
    data: [ { $skip: 20 }, { $limit: 10 } ] // add projection here wish you re-shape the docs
}

Other aspects of slowdown query depends on

  • configuration of your backend server and database server.
  • distance between frontend -> backend -> database server.
  • incoming and outgoing request per second.
  • internet connection of course

Complete Query will look like this

PostObj.aggregate([
    {
        $lookup: {
            let: { post_id: "$_id" },
            from: 'categories',
            pipeline: [
                {
                    $match: {
                        $expr: {
                            $and: [
                                { $eq: ["$post_id", "$$post_id"], },
                            ]
                        }
                    },
                },
                {
                    $match: {
                        $or: [

                            // AVOID `new` keyword if you can do such;
                            // and create indexes for the same;

                            { "catagory_name": { $regex: `^${search_data}` } },
                            { "postname": { $regex: `^${search_data}` } },
                            { "posturl": { $regex: `^${search_data}` } },
                            { "postdata": { $regex: `^${search_data}` } },
                            { "tags": { $regex: `^${search_data}` } }
                        ]
                    }

                }
            ],
            as: "catagories_data"
        }
    },
    {
        '$facet': {
            metadata: [{ $count: "total" }, { $addFields: { page: NumberInt(3) } }],
            catagories_data: [{ $skip: 0 }, { $limit: 10 }]
        }
    }
])
like image 38
Shivam Mishra Avatar answered Nov 11 '22 02:11

Shivam Mishra