Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

taking the difference between adjacent documents in mongoDB

How do I take the difference between adjacent records in mongoDB using javascript? For example, if I have the following three documents in a collection:

{
    "_id" : ObjectId("50ed90a55502684f440001ac"),
    "time" : ISODate("2013-02-13T15:45:41.148Z")
}

{
    "_id" : ObjectId("50ed90a55502684f440001ad"),
    "time" : ISODate("2013-02-13T15:45:42.148Z")
}

{
    "_id" : ObjectId("50ed90a55502684f440001ae"),
    "time" : ISODate("2013-02-13T15:45:45.148Z")
}

I want to take the difference in the "time" field between adjacent values to get:

{
    "_id" : ObjectId("50ed90a55502684f440001ac"),
    "time" : ISODate("2013-02-13T15:45:41.148Z"),
    "time_difference" : null
}

{
    "_id" : ObjectId("50ed90a55502684f440001ad"),
    "time" : ISODate("2013-02-13T15:45:42.148Z"),
    "time_difference" : 1
}

{
    "_id" : ObjectId("50ed90a55502684f440001ae"),
    "time" : ISODate("2013-02-13T15:45:45.148Z"),
    "time_difference" : 3
}

Any ideas on how to do this efficiently in javascript/mongoDB? Thanks.

like image 549
user1910316 Avatar asked Feb 17 '23 12:02

user1910316


2 Answers

I don't know whether this was true when the question was asked seven years ago, but this can be solved completely within the aggregation framework. Assuming the collection name is AdjacentDocument, the following aggregation will get the results you're looking for:

db.AdjacentDocument.aggregate(
    {$sort: {time: 1}},
    {$group: {_id: 0, document: {$push: '$$ROOT'}}},
    {$project: {documentAndPrevTime: {$zip: {inputs: ['$document', {$concatArrays: [[null], '$document.time']}]}}}},
    {$unwind: {path: '$documentAndPrevTime'}},
    {$replaceWith: {$mergeObjects: [{$arrayElemAt: ['$documentAndPrevTime', 0]}, {prevTime: {$arrayElemAt: ['$documentAndPrevTime', 1]}}]}},
    {$set: {time_difference: {$trunc: [{$divide: [{$subtract: ['$time', '$prevTime']}, 1000]}]}}},
    {$unset: 'prevTime'}
);

Aggregation pipeline walkthrough

First, the documents are sorted from oldest to newest. They are grouped into a single document with the documents stored in an ordered array field:

{$sort: {time: 1}},
{$group: {_id: 0, document: {$push: '$$ROOT'}}}

/*
{
    "_id" : 0,
    "document" : [
        {
            "_id" : ObjectId("50ed90a55502684f440001ac"),
            "time" : ISODate("2013-02-13T15:45:41.148Z")
        },
        {
            "_id" : ObjectId("50ed90a55502684f440001ad"),
            "time" : ISODate("2013-02-13T15:45:42.148Z")
        },
        {
            "_id" : ObjectId("50ed90a55502684f440001ae"),
            "time" : ISODate("2013-02-13T15:45:45.148Z")
        }
    ]
}
*/

Next, the previous times are zipped into the document array, creating an array of [document, previousTime]:

{$project: {documentAndPrevTime: {$zip: {inputs: ['$document', {$concatArrays: [[null], '$document.time']}]}}}}

/*
{
    "_id" : 0,
    "documentAndPrevTime" : [
        [
            {
                "_id" : ObjectId("50ed90a55502684f440001ac"),
                "time" : ISODate("2013-02-13T15:45:41.148Z")
            },
            null
        ],
        [
            {
                "_id" : ObjectId("50ed90a55502684f440001ad"),
                "time" : ISODate("2013-02-13T15:45:42.148Z")
            },
            ISODate("2013-02-13T15:45:41.148Z")
        ],
        [
            {
                "_id" : ObjectId("50ed90a55502684f440001ae"),
                "time" : ISODate("2013-02-13T15:45:45.148Z")
            },
            ISODate("2013-02-13T15:45:42.148Z")
        ]
    ]
}
*/

Next, the document & time array is unwound, creating a document for each of the initial documents:

{$unwind: {path: '$documentAndPrevTime'}}

/*
{
    "_id" : 0,
    "documentAndPrevTime" : [
        {
            "_id" : ObjectId("50ed90a55502684f440001ac"),
            "time" : ISODate("2013-02-13T15:45:41.148Z")
        },
        null
    ]
}
{
    "_id" : 0,
    "documentAndPrevTime" : [
        {
            "_id" : ObjectId("50ed90a55502684f440001ad"),
            "time" : ISODate("2013-02-13T15:45:42.148Z")
        },
        ISODate("2013-02-13T15:45:41.148Z")
    ]
}
{
    "_id" : 0,
    "documentAndPrevTime" : [
        {
            "_id" : ObjectId("50ed90a55502684f440001ae"),
            "time" : ISODate("2013-02-13T15:45:45.148Z")
        },
        ISODate("2013-02-13T15:45:42.148Z")
    ]
}
*/

Next, we replace the document with the value of the document array element, merged with previous time element (using null if it's the initial index):

{$replaceWith: {$mergeObjects: [{$arrayElemAt: ['$documentAndPrevTime', 0]}, {prevTime: {$arrayElemAt: ['$documentAndPrevTime', 1]}}]}}

/*
{
    "_id" : ObjectId("50ed90a55502684f440001ac"),
    "time" : ISODate("2013-02-13T15:45:41.148Z"),
    "prevTime" : null
}
{
    "_id" : ObjectId("50ed90a55502684f440001ad"),
    "time" : ISODate("2013-02-13T15:45:42.148Z"),
    "prevTime" : ISODate("2013-02-13T15:45:41.148Z")
}
{
    "_id" : ObjectId("50ed90a55502684f440001ae"),
    "time" : ISODate("2013-02-13T15:45:45.148Z"),
    "prevTime" : ISODate("2013-02-13T15:45:42.148Z")
}
*/

Finally, we update the document by setting the time_difference to the difference of the two time fields, and removing the temporary prevTime field. Since the difference between two dates is in milliseconds and your example uses seconds, we calculate the seconds by dividing by 1000 and truncating.

{$set: {time_difference: {$trunc: [{$divide: [{$subtract: ['$time', '$prevTime']}, 1000]}]}}},
{$unset: 'prevTime'}

/*
{
    "_id" : ObjectId("50ed90a55502684f440001ac"),
    "time" : ISODate("2013-02-13T15:45:41.148Z"),
    "time_difference" : null
}
{
    "_id" : ObjectId("50ed90a55502684f440001ad"),
    "time" : ISODate("2013-02-13T15:45:42.148Z"),
    "time_difference" : 1
}
{
    "_id" : ObjectId("50ed90a55502684f440001ae"),
    "time" : ISODate("2013-02-13T15:45:45.148Z"),
    "time_difference" : 3
}
*/
like image 89
M. Justin Avatar answered Feb 20 '23 09:02

M. Justin


The one thing you will want to make sure of here is that you have a sort on the query you wish to use to garnish your records. If no sort is used it will actually use find order, which is not $natural order.

Find order can differ between queries so if you run the query twice within the period of 2 minutes you might find that they don't return the same order. It does seem however that your query would be logically sorted on tiem_difference.

It should also by noted that this is not possible through normal querying. I also do not see an easy way doing this through the aggregation framework.

So already it seems the next plausible method is either using multiple queries or client side processing. Client side processing is probably the better here using a function like the one defined by @Marlon above.

like image 37
Sammaye Avatar answered Feb 20 '23 08:02

Sammaye