Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by date intervals

I have a collection with documents like this:

{ datetime: new Date(), count: 1234 }

I want to get sums of count by 24 hours, 7 days and 30 days intervals.

The result should be like:

{ "sum": 100,  "interval": "day" }
{ "sum": 700,  "interval": "week" }
{ "sum": 3000, "interval": "month" }

In more abstract terms, I need to group results by multiple conditions (in this case — multiple time intervals)

The MySQL equivalent would be:

SELECT 
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 24 HOUR, 1, 0) last_day,
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 168 HOUR, 1, 0) last_week,
    IF (time>CURRENT_TIMESTAMP() - INTERVAL 720 HOUR, 1, 0) last_month,
    SUM(count) count
FROM table
GROUP BY    last_day,
            last_week,
            last_month
like image 525
jonasasx Avatar asked Jan 03 '15 01:01

jonasasx


3 Answers

There are date aggregation operators available to the aggregation framework of MongoDB. So for example a $dayOfYear operator is used to get that value from the date for use in grouping:

db.collection.aggregate([
    { "$group": {
        "_id": { "$dayOfYear": "$datetime" },
        "total": { "$sum": "$count" }
    }}
])

Or you can use a date math approach instead. By applying the epoch date you convert the date object to a number where the math can be applied:

db.collection.aggregate([
    { "$group": {
        "_id": { 
            "$subtract": [
                { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                { "$mod": [
                    { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                    1000 * 60 * 60 * 24
                ]}
            ]
        },
        "total": { "$sum": "$count" }
    }}
])

If what you are after is intervals from a current point in time then what you want is basically the date math approach and working in some conditionals via the $cond operator:

db.collection.aggregate([
    { "$match": {
        "datetime": { 
            "$gte": new Date(new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 ))
        }
    }},
    { "$group": {
        "_id": null,
        "24hours": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "30days": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 30 )
                    ]},
                    "$count",
                    0
                ]
            }
        },
        "OneYear": { 
            "$sum": {
                "$cond": [
                    { "$gt": [
                        { "$subtract": [ "$datetime", new Date("1970-01-01") ] },
                        new Date().valueOf() - ( 1000 * 60 * 60 * 24 * 365 )
                    ]},
                    "$count",
                    0
                ]
            }
        }
    }}
])

It's essentially the same approach as the SQL example, where the query conditionally evaluates whether the date value falls within the required range and decides whether or not to add the value to the sum.

The one addition here is the additional $match stage to restrict the query to only act on those items that would possibly be within the maximum one year range you are asking for. That makes it a bit better than the presented SQL in that an index could be used to filter those values out and you don't need to "brute force" through non matching data in the collection.

Always a good idea to restrict the input with $match when using an aggregation pipeline.

like image 144
Neil Lunn Avatar answered Nov 02 '22 05:11

Neil Lunn


There are two different ways to do this. One is to issue a separate count() query for each of the ranges. This is pretty easy, and if the datetime field is indexed, it will be fast.

The second way is to combine them all into one query using a similar method as your SQL example. To do this, you need to use the aggregate() method, creating a pipeline of $project to create the 0 or 1 values for the new "last_day", "last_week", and "last_month" fields, and then use the $group operator to do the sums.

like image 27
Tom Panning Avatar answered Nov 02 '22 05:11

Tom Panning


Starting in Mongo 5, it's a nice use case for the $dateDiff operator in association with a $facet stage:

// { date: ISODate("2021-12-04"), count: 3  } <= today
// { date: ISODate("2021-11-29"), count: 5  } <= last week
// { date: ISODate("2021-11-24"), count: 1  } <= last month
// { date: ISODate("2021-11-12"), count: 12 } <= last month
// { date: ISODate("2021-10-04"), count: 8  } <= too old
db.collection.aggregate([

  { $set: {
    diff: { $dateDiff: { startDate: "$$NOW", endDate: "$date", unit: "day" } }
  }},

  { $facet: {
    lastMonth: [
      { $match: { diff: { $gt: -30 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ],
    lastWeek: [
      { $match: { diff: { $gt: -7 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ],
    lastDay: [
      { $match: { diff: { $gt: -1 } } },
      { $group: { _id: null, total: { $sum: "$count" } } }
    ]
  }},

  { $set: {
    lastMonth: { $first: "$lastMonth.total" },
    lastWeek: { $first: "$lastWeek.total" },
    lastDay: { $first: "$lastDay.total" }
  }}
])
// { lastMonth: 21, lastWeek: 8, lastDay: 3 }

This:

  • first computes (with $dateDiff) the number of days of difference between today ("$$NOW") and the document's date

    • if the date is 3 days ago, diff will be set to -3

    • the intermediate result being:

      { date: ISODate("2021-12-04"), count: 3,  diff: 0   }
      { date: ISODate("2021-11-29"), count: 5,  diff: -5  }
      { date: ISODate("2021-11-24"), count: 1,  diff: -10 }
      { date: ISODate("2021-11-12"), count: 12, diff: -22 }
      { date: ISODate("2021-10-04"), count: 8,  diff: -61 }
      
  • then performs a $facet stage that allows us to run multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its result is stored as an array of documents.

    • this way, we can create a lastMonth field that'll contain the sum of counts ($sum: "$count") for documents whose day diff with today is more than 30 days ({ $match: { diff: { $gt: -30 } } })

    • while we do the same for lastWeek and lastDay.

    • the intermediate result being:

      {
        lastMonth: [{ _id: null, total: 21 }],
        lastWeek: [{ _id: null, total: 8 }],
        lastDay: [{ _id: null, total: 3 }]
      }
      
  • and finally cleans up the $facet output with a $set stage to get fields in a nice format:

    { lastMonth: 21, lastWeek: 8, lastDay: 3 }
    
like image 27
Xavier Guihot Avatar answered Nov 02 '22 04:11

Xavier Guihot