Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill missing dates in records

I have a collection of ProductViews:

{
    productId: "5b8c0f3204a10228b00a1745",
    createdAt: "2018-09-07T17:18:40.759Z"
}

And I have a query for fetching the daily views for a specific product:

ProductView.aggregate([
    { $match: { productId } },
    { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
    {
        $group: {
            _id: "$day",
            count: { $sum: 1 },
            time: { $avg: "$createdAt" },
        }
    },
    { $sort: { _id: 1 } },
    {
        $project: {
            date: '$_id',
            views: '$count',
        },
    },
]).exec((err, result) => ...)

which currently gives:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

Issue:

The issue is, that this aggregation does not return { date: '2018-09-03', views: 0 } for days with 0 views. This results in incorrect displaying of the data: [![enter image description here][1]][1]

Results should look like:

[
    { date: '2018-09-01', views: 1 },
    { date: '2018-09-02', views: 3 },
    { date: '2018-09-03', views: 0 }, <=
    { date: '2018-09-04', views: 2 },
    { date: '2018-09-05', views: 5 },
    // ...
]

P.S.: It would be perfect to pass in the start and end dates to output results based on this range [1]: https://i.stack.imgur.com/uHPBs.png

like image 346
Flo Avatar asked Sep 08 '18 11:09

Flo


2 Answers

You need few additional stages to return default values. First of all you need to use $group with _id set to null to collect all results in one document. Then you can use $map with an array of days as an input. Inside that $map you can use $indexOfArray to find if that date exists in your current result set. If yes (index != -1) then you can return that value, otherwise you need to return default subdocument with views set to 0. Then you can use $unwind to get back a list of documents and $replaceRoot to promote nested stats to a top level.

ProductView.aggregate([
    { $match: { productId: '5b8c0f3204a10228b00a1745' } },
    { $project: { day: { $substr: ["$createdAt", 0, 10] } } },
    {
        $group: {
            _id: "$day",
            count: { $sum: 1 },
            time: { $avg: "$createdAt" },
        }
    },
    { $sort: { _id: 1 } },
    {
        $project: {
            date: '$_id',
            views: '$count',
        },
    },
    {
        $group: {
            _id: null,
            stats: { $push: "$$ROOT" }
        }
    },
    {
        $project: {
            stats: {
                $map: {
                    input: [ "2018-09-01", "2018-09-02", "2018-09-03", "2018-09-04", "2018-09-05" ],
                    as: "date",
                    in: {
                        $let: {
                            vars: { dateIndex: { "$indexOfArray": [ "$stats._id", "$$date" ] } },
                            in: { 
                                $cond: {
                                    if: { $ne: [ "$$dateIndex", -1 ] },
                                    then: { $arrayElemAt: [ "$stats", "$$dateIndex" ] },
                                    else: { _id: "$$date", date: "$$date", views: 0 }
                                } 
                            }
                        }
                    }
                }
            }
        }
    },
    {
        $unwind: "$stats"
    },
    {
        $replaceRoot: {
            newRoot: "$stats"
        }
    }
]).exec((err, result) => ...)

You can generate a static list of dates in your application logic using simple loop. I believe that's possible in MongoDB as well (using $range) but it might complicate this aggregation pipeline. Let me know if you're fine with that or you want to try to generate that array of dates in MongoDB.

like image 194
mickl Avatar answered Oct 03 '22 14:10

mickl


Starting in Mongo 5.1, it's a perfect use case for the new $densify aggregation operator:

// { date: ISODate("2018-09-01"), views: 1 }
// { date: ISODate("2018-09-02"), views: 3 }
// { date: ISODate("2018-09-04"), views: 2 }
// { date: ISODate("2018-09-05"), views: 5 }
db.collection.aggregate([
  { $densify: {
    field: "date",
    range: { step: 1, unit: "day", bounds: "full" }
  }},
  { $set: { views: { $cond: [ { $not: ["$views"] }, 0, "$views" ] } } }
])
// { date: ISODate("2018-09-01"), views: 1 }
// { date: ISODate("2018-09-02"), views: 3 }
// { date: ISODate("2018-09-03"), views: 0 } <=
// { date: ISODate("2018-09-04"), views: 2 }
// { date: ISODate("2018-09-05"), views: 5 }

This:

  • densifies documents ($densify) by creating new documents in a sequence of documents where certain values for a field (in our case field: "date") are missing:
    • the step for our densification is 1 day: range: { step: 1, unit: "day", ... }
    • and we densify within the range of dates defined by our documents: bounds: "full"
  • finally sets ($set) views to 0 only for new documents included during the densify stage ({ views: { $cond: [ { $not: ["$views"] }, 0, "$views" ] })

Of course, to make this work with your specific example, you can switch from dates to strings with $dateToString ($dateToString: { format: "%Y-%m-%d", date: "$date" }) and back to dates with $dateFromString ($dateFromString: { dateString: "$date" })


And concerning your P.S. (pass in the start and end dates to output results based on this range), you can replace bounds: "full" with bounds: [ISODate("2018-08-25"), ISODate("2018-09-07")]

like image 42
Xavier Guihot Avatar answered Oct 03 '22 14:10

Xavier Guihot