Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - aggregate and concatenate results to group

I have some items collection like this:

[
    { name: 'item1', description: 'description #1', categories: 'cat_A; cat_B'},
    { name: 'item2', description: 'description #2', categories: 'cat_B'},
    { name: 'item3', description: 'description #3', categories: 'cat_C; cat_B'},
    { name: 'item4', description: 'description #4', categories: 'cat_B; cat_A'},
    { name: 'item5', description: 'description #5', categories: 'cat_B'},
    { name: 'item6', description: 'description #6', categories: 'cat_D'}
]

I want to find and filter results by categories. I created mongo query:

db.getCollection('items')
    .aggregate([
        {
            $match: {
                categories: {$in: [/cat_a/i, /cat_b/i]}
            }
        }, {
            $group: {
                _id: "$categories",
                items: { $push:  { name: "$name", description: '$description' } }
            }
        }
    ])

So it returns me this:

result : [
    {
        "_id" : "cat_B; cat_C",
        "items" : [
            {
                "name" : "item3",
                "description" : "description #3"
            }
        ]
    }, {
        "_id" : "cat_B; cat_A",
        "items" : [
            {
                "name" : "item4",
                "description" : "description #4"
            }
        ]
    }, {
        "_id" : "cat_B",
        "items" : [
            {
                "name" : "item2",
                "description" : "description #2"
            },
            {
                "name" : "item5",
                "description" : "description #5"
            }
        ]
    }, {
        "_id" : "cat_A; cat_B",
        "items" : [
            {
                "name" : "item1",
                "description" : "description #1"
            }
        ]
    }
]

What I want to achieve is:

result : [
    {
        "_id" : "cat_A",
        "items" : [
            {
                "name" : "item1",
                "description" : "description #1"
            },
            {
                "name" : "item4",
                "description" : "description #4"
            }
        ]
    }, {
        "_id" : "cat_B",
        "items" : [
            {
                "name" : "item1",
                "description" : "description #1"
            },
            {
                "name" : "item2",
                "description" : "description #2"
            },
            {
                "name" : "item3",
                "description" : "description #3"
            },
            {
                "name" : "item4",
                "description" : "description #4"
            },
            {
                "name" : "item5",
                "description" : "description #5"
            }
        ]
    }
]

Is that possible in pure mongo query?

like image 473
zucker Avatar asked Oct 30 '22 01:10

zucker


1 Answers

With the aggregation framework you're going to need a mechanism to split the categories string into a distinct set but such an operator does not exist yet; the closest you will get is the substr operator which would require knowing the index of the index position and specified number of characters for the substring to be extracted, which would be near impossible. Hence the recommendation of storing the categories as an array of distinct category names.

--EDIT--

Should you want to keep the categories field as is then I would suggest you create an extra field that stores the list of categories, then you can run the aggregation pipeline on that field to get the desired result.

Let's use an example to demonstrate the approaches above:

Changing the schema

a) If using MongoDB v3.0 or below:

var bulk = db.items.initializeOrderedBulkOp(),
    counter = 0;

db.items.find({}).forEach(doc) {
    var categoriesList = doc.categories.replace(/^\s+|\s+$/g,"").split(/\s*;\s*/);
    bulk.find({ "_id": doc._id })
        .updateOne({ 
            "$set": { "categoriesList": categoriesList } 
        });
    counter++;

    if (counter % 1000 == 0) {
        bulk.execute();
        bulk = db.items.initializeOrderedBulkOp();
    }
} 

if (counter % 1000 != 0 ) bulk.execute();

b) If using MongoDB v3.2.X or above:

var cursor = db.items.find({}),
    bulkUpdateOps = [];

cursor.forEach(function(doc){ 
    var categoriesList = doc.categories.replace(/^\s+|\s+$/g,"").split(/\s*;\s*/);
    bulkUpdateOps.push({ 
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": { "$set": { "categoriesList": categoriesList } }
         }
    });

    if (bulkUpdateOps.length == 1000) {
        db.items.bulkWrite(bulkUpdateOps);
        bulkUpdateOps = [];
    }
});         

if (bulkUpdateOps.length > 0) db.items.bulkWrite(bulkUpdateOps); 

Running the aggregation on new schema

db.items.aggregate([
    { "$match": { "categoriesList": { "$in": ['cat_A', 'cat_B'] } } },
    { "$unwind": "$categoriesList" },
    {
        "$group": {
            "_id": "$categoriesList",
            "items": { "$push":  { "name": "$name", "description": '$description' } }
        }
    }
])
like image 52
chridam Avatar answered Nov 15 '22 09:11

chridam