Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb aggregation by day based on unix timestamp

Tags:

mongodb

nosql

I have googled alot, but not found any helpful solution... I want to find total number of daily users. I have a collection named session_log having documents like following

{
    "_id" : ObjectId("52c690955d3cdd831504ce30"),
    "SORTID" : NumberLong(1388744853),
    "PLAYERID" : 3,
    "LASTLOGIN" : NumberLong(1388744461),
    "ISLOGIN" : 1,
    "LOGOUT" : NumberLong(1388744853)
}

I want to aggregate from LASTLOGIN...

This is my query:

db.session_log.aggregate(
    { $group : {
        _id: {
            LASTLOGIN : "$LASTLOGIN"
        },
        count: { $sum: 1 }
    }}
);

But it is aggregating by each login time, not by each day. Any help would be appreciated

like image 445
Ali Mehdi Avatar asked Oct 12 '15 10:10

Ali Mehdi


People also ask

How do I sort data in MongoDB aggregation?

In MongoDB, the $sort stage is used to sort all the documents in the aggregation pipeline and pass a sorted order to the next stage of the pipeline. Lets take a closer look at the above syntax: The $sort stage accepts a document that defines the field or fields that will be used for sorting.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection. The aggregation pipeline can use indexes to improve its performance during some of its stages.

Is aggregation fast in MongoDB?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.


1 Answers

MongoDB 4.0 and newer

Use $toDate

db.session_log.aggregate([
    { "$group": {
        "_id": {
            "$dateToString": {
                "format": "%Y-%m-%d",
                "date": {
                    "$toDate": { 
                        "$multiply": [1000, "$LASTLOGIN"]
                    }
                }
            }
        },
        "count": { "$sum": 1 }
    } }
])

or $convert

db.session_log.aggregate([
    { "$group": {
        "_id": {
            "$dateToString": {
                "format": "%Y-%m-%d",
                "date": {
                    "$convert": { 
                        "input":  { 
                            "$multiply": [1000, "$LASTLOGIN"] 
                        }, 
                        "to": "date"
                    }
                }
            }
        },
        "count": { "$sum": 1 }
    } }
])

MongoDB >= 3.0 and < 4.0:

db.session_log.aggregate([
    { "$group": {
        "_id": {
            "$dateToString": {
                "format": "%Y-%m-%d",
                "date": {
                    "$add": [
                        new Date(0), 
                        { "$multiply": [1000, "$LASTLOGIN"] }
                    ]
                }
            }
        },
        "count": { "$sum": 1 }
    } }
])

You would need to convert the LASTLOGIN field to a millisecond timestamp through multiplying the value by 1000

{ "$multiply": [1000, "$LASTLOGIN"] }

, then convert to a date

"$add": [
    new Date(0),
    { "$multiply": [1000, "$LASTLOGIN"] }
]

and this can be done in the $project pipeline by adding your milliseconds time to a zero-milliseconds Date(0) object, then extract $year, $month, $dayOfMonth parts from the converted date which you can then use in your $group pipeline to group the documents by the day.

You should thus change your aggregation pipeline to this:

var project = {
    "$project":{ 
        "_id": 0,
        "y": {
            "$year": {
                "$add": [
                    new Date(0),
                    { "$multiply": [1000, "$LASTLOGIN"] }
                ]
            }
        },
        "m": {
            "$month": {
                "$add": [
                    new Date(0),
                    { "$multiply": [1000, "$LASTLOGIN"] }
                ]
            }
        }, 
        "d": {
            "$dayOfMonth": {
                "$add": [
                    new Date(0),
                    { "$multiply": [1000, "$LASTLOGIN"] }
                ]
            }
        }
    } 
},
group = {   
    "$group": { 
        "_id": { 
            "year": "$y", 
            "month": "$m", 
            "day": "$d"
        },  
        "count" : { "$sum" : 1 }
    }
};

Running the aggregation pipeline:

db.session_log.aggregate([ project, group ])

would give the following results (based on the sample document):

{ "_id" : { "year" : 2014, "month" : 1, "day" : 3 }, "count" : 1 }

An improvement would be to run the above in a single pipeline as

var group = {   
    "$group": { 
        "_id": {    
            "year": {
                "$year": {
                    "$add": [
                        new Date(0),
                        { "$multiply": [1000, "$LASTLOGIN"] }
                    ]
                }
            },
            "mmonth": {
                "$month": {
                    "$add": [
                        new Date(0),
                        { "$multiply": [1000, "$LASTLOGIN"] }
                    ]
                }
            }, 
            "day": {
                "$dayOfMonth": {
                    "$add": [
                        new Date(0),
                        { "$multiply": [1000, "$LASTLOGIN"] }
                    ]
                }
            }
        },  
        "count" : { "$sum" : 1 }
    }
};

Running the aggregation pipeline:

db.session_log.aggregate([ group ])

like image 72
chridam Avatar answered Oct 11 '22 18:10

chridam