Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo $subtract date doesn't work in aggregation $match block

I am creating a mongo aggregation query which use a $subtract operator in my $match block. As explained in these codes below.

This query doesn't work:

db.coll.aggregate(
[
    {
        $match: {
            timestamp: {
                $gte: {
                    $subtract: [new Date(), 24 * 60 * 60 * 1000]
                }
            }
        }
    },
    {
        $group: {
            _id: {
                timestamp: "$timestamp"
            },
            total: {
                $sum: 1
            }
        }
    },
    {
        $project: {
            _id: 0,
            timestamp: "$_id.timestamp",
            total: "$total",
        }
    },
    {
        $sort: {
            timestamp: -1
        }
    }
]
)

However, this second query work:

db.coll.aggregate(
[
    {
        $match: {
            timestamp: {
                $gte: new Date(new Date() - 24 * 60 * 60 * 1000)
            }
        }
    },
    {
        $group: {
            _id: {
                timestamp: "$timestamp"
            },
            total: {
                $sum: 1
            }
        }
    },
    {
        $project: {
            _id: 0,
            timestamp: "$_id.timestamp",
            total: "$total",
        }
    },
    {
        $sort: {
            timestamp: -1
        }
    }
]
)

I need to use $subtract on my $match block so I can't use the last query.

like image 401
indraep Avatar asked Mar 09 '15 08:03

indraep


3 Answers

As of mongodb 3.6 you can use $subtract in the $match stage via the $expr. Here's the docs: https://docs.mongodb.com/manual/reference/operator/query/expr/

I was able to get a query like what you're describing via this $expr and a new system variable in mongodb 4.2 called $$NOW. Here is my query, which gives me orders that have been created within the last 4 hours:

[ 
 { $match: 
  { $expr: 
   { $gt: [ 
    "$_created_at", 
     { $subtract: [ "$$NOW", 4 * 60 * 60 * 1000] } ] 
   } 
  } 
 }
]
like image 53
Chris Edgington Avatar answered Oct 14 '22 19:10

Chris Edgington


The $subtract operator is a projection-operator. It is only available during a $project step. So your options are:

  • (not recommended) Add a $project-step before your $match-step to convert the timestamp field of all documents for the following match-step. I would not recommend you to do this because this operation needs to be performed on every single document on your database and prevents the database from using an index on the timestamp field, so it could cost you a lot of performance.
  • (recommended) Generate the Date you want to match against in the shell / in your application. Generate a new Date() object, store it in a variable, subtract 24 hours from it and perform your 2nd query using that variable.
like image 36
Philipp Avatar answered Oct 14 '22 20:10

Philipp


Well you cannot do that and you are not meant to do so either. Another valid thing is that you say to "need" to do this but in reality you really do not.

Pretty much all of the general aggregation operators outside of the pipeline operators are really only valid within a $project or a $group pipeline stage. Mostly within $project but certainly not in others.

A $match pipeline is really the same as a general "query" operation, so the only things valid in there are the query operators.

As for the case for your "need", any "value" that is submitted within an aggregation pipeline and particularly within a $match needs to be evaluated outside of the actual pipeline before the BSON representation is sent to the server.

The only exception is the notation that defines variables in the document, particularly "fieldnames" such a "$fieldname" and then only really in $project or $group. So that means something that "refers" to an existing value of a document, and that is something that cannot be done within any type of "query" document expression.

If you need to work with the value of another field in the document then you work it out with $project first, as in:

db.collection.aggregate([
    { "$project": {
        "fieldMath": { "$subtract": [ "$fieldOne", "$fieldTwo" ] }
    }},
    { "$match": { "fieldMath": { "$gt": 2 } }}
])

For any other purpose you really want to evaluate the value "outside" the pipeline.


The above answers the question you asked, but this answers the question you didn't ask.

Your pipeline doesn't make any sense since grouping on the "timestamp" alone would be unlikely to group anything since the values are of millisecond accuracy and there is likely not to be more than just a few at best for very active systems.

It appears like you are looking for the math to group by "day", which you can do like this:

db.collection.aggregate([
    { "$group": {
        "_id": {
            "$subtract": [
                { "$subtract": [ "$timestamp", new Date(0) ] },
                { "$mod": [
                    { "$subtract": [ "$timestamp", new Date(0) ] },
                    1000 * 60 * 60 * 24
                ]}
            ]
        },
        "total": { "$sum": "$total" }
    }}
])

That "rounds" your timestamp value to a single day and has a much better chance of "aggregating" something than you would otherwise have.

Or you can use the "date aggregation operators" to do much the same thing with a composite key.


So if you want to "query" then it evaluates externally. If you want to work on a value "within the document" then you must do so in either a $project or $group pipeline stage.

like image 27
Neil Lunn Avatar answered Oct 14 '22 21:10

Neil Lunn