Is it possible to calculate a first order derivative using the aggregate framework? For example, I have the data : <pre class="prettyprint"><code>{time_series : [10,20,40,70,110]} </code></pre> I'm trying to obtain an output like: <pre class="prettyprint"><code>{derivative : [10,20,30,40]} </code></pre>

<pre class="prettyprint"><code>db.collection.aggregate( [ { "$addFields": { "indexes": { "$range": [ 0, { "$size": "$time_series" } ] }, "reversedSeries": { "$reverseArray": "$time_series" } } }, { "$project": { "derivatives": { "$reverseArray": { "$slice": [ { "$map": { "input": { "$zip": { "inputs": [ "$reversedSeries", "$indexes" ] } }, "in": { "$subtract": [ { "$arrayElemAt": [ "$$this", 0 ] }, { "$arrayElemAt": [ "$reversedSeries", { "$add": [ { "$arrayElemAt": [ "$$this", 1 ] }, 1 ] } ] } ] } } }, { "$subtract": [ { "$size": "$time_series" }, 1 ] } ] } }, "time_series": 1 } } ] ) </code></pre> We can use the pipeline above in version 3.4+ to do this. In the pipeline, we use the <code>$addFields</code> pipeline stage. operator to add the array of the "time_series"'s elements index to do document, we also reversed the time series array and add it to the document using respectively the <code>$range</code> and <code>$reverseArray</code> operators We reversed the array here because the element at position <code>p</code> in the array is always greater than the element at position <code>p+1</code> which means that <code>[p] - [p+1] < 0</code> and we do not want to use the <code>$multiply</code> here.(see pipeline for version 3.2) Next we <code>$zipped</code> the time series data with the indexes array and applied a <code>substract</code> expression to the resulted array using the <code>$map</code> operator. We then <code>$slice</code> the result to discard the <code>null/None</code> value from the array and re-reversed the result. <hr> In 3.2 we can use the <code>$unwind</code> operator to unwind our array and include the index of each element in the array by specifying a document as operand instead of the traditional "path" prefixed by $. Next in the pipeline, we need to <code>$group</code> our documents and use the <code>$push</code> accumulator operator to return an array of sub-documents that look like this: <pre class="prettyprint"><code>{ "_id" : ObjectId("57c11ddbe860bd0b5df6bc64"), "time_series" : [ { "value" : 10, "index" : NumberLong(0) }, { "value" : 20, "index" : NumberLong(1) }, { "value" : 40, "index" : NumberLong(2) }, { "value" : 70, "index" : NumberLong(3) }, { "value" : 110, "index" : NumberLong(4) } ] } </code></pre> <hr> Finally comes the <code>$project</code> stage. In this stage, we need to use the <code>$map</code> operator to apply a series of expression to each element in the the newly computed array in the <code>$group</code> stage. Here is what is going on inside the <code>$map</code> (see <code>$map</code> as a for loop) in expression: For each subdocument, we assign the value field to a variable using the <code>$let</code> variable operator. We then subtract it value from the value of the "value" field of the next element in the array. Since the next element in the array is the element at the current index plus one, all we need is the help of the <code>$arrayElemAt</code> operator and a simple <code>$add</code>ition of the current element's index and <code>1</code>. The <code>$subtract</code> expression return a negative value so we need to multiply the value by <code>-1</code> using the <code>$multiply</code> operator. We also need to <code>$filter</code> the resulted array because it the last element is <code>None</code> or <code>null</code>. The reason is that when the current element is the last element, <code>$subtract</code> return <code>None</code> because the index of the next element equal the size of the array. <pre class="prettyprint"><code>db.collection.aggregate([ { "$unwind": { "path": "$time_series", "includeArrayIndex": "index" } }, { "$group": { "_id": "$_id", "time_series": { "$push": { "value": "$time_series", "index": "$index" } } } }, { "$project": { "time_series": { "$filter": { "input": { "$map": { "input": "$time_series", "as": "el", "in": { "$multiply": [ { "$subtract": [ "$$el.value", { "$let": { "vars": { "nextElement": { "$arrayElemAt": [ "$time_series", { "$add": [ "$$el.index", 1 ] } ] } }, "in": "$$nextElement.value" } } ] }, -1 ] } } }, "as": "item", "cond": { "$gte": [ "$$item", 0 ] } } } } } ]) </code></pre> <hr> Another option which I think is less efficient is perform a map/reduce operation on our collection using the <code>map_reduce</code> method. <pre class="prettyprint"><code>>>> import pymongo >>> from bson.code import Code >>> client = pymongo.MongoClient() >>> db = client.test >>> collection = db.collection >>> mapper = Code(""" ... function() { ... var derivatives = []; ... for (var index=1; index<this.time_series.length; index++) { ... derivatives.push(this.time_series[index] - this.time_series[index-1]); ... } ... emit(this._id, derivatives); ... } ... """) >>> reducer = Code(""" ... function(key, value) {} ... """) >>> for res in collection.map_reduce(mapper, reducer, out={'inline': 1})['results']: ... print(res) # or do something with the document. ... {'value': [10.0, 20.0, 30.0, 40.0], '_id': ObjectId('57c11ddbe860bd0b5df6bc64')} </code></pre> <hr> You can also retrieve all the document and use the <code>numpy.diff</code> to return the derivative like this: <pre class="prettyprint"><code>import numpy as np for document in collection.find({}, {'time_series': 1}): result = np.diff(document['time_series']) </code></pre>

it's a bit dirty, but perhaps something like this? <pre class="prettyprint"><code>use test_db db['data'].remove({}) db['data'].insert({id: 1, time_series: [10,20,40,70,110]}) var mapF = function() { emit(this.id, this.time_series); emit(this.id, this.time_series); }; var reduceF = function(key, values){ var n = values[0].length; var ret = []; for(var i = 0; i < n-1; i++){ ret.push( values[0][i+1] - values[0][i] ); } return {'gradient': ret}; }; var finalizeF = function(key, val){ return val.gradient; } db['data'].mapReduce( mapF, reduceF, { out: 'data_d1', finalize: finalizeF } ) db['data_d1'].find({}) </code></pre> The "strategy" here is to emit the data to be operated on twice so that it is accessible in the reduce stage, return an object to avoid the message "reduce -> multiple not supported yet" and then filter back the array in the finalizer. This script then produces: <pre class="prettyprint"><code>MongoDB shell version: 3.2.9 connecting to: test switched to db test_db WriteResult({ "nRemoved" : 1 }) WriteResult({ "nInserted" : 1 }) { "result" : "data_d1", "timeMillis" : 13, "counts" : { "input" : 1, "emit" : 2, "reduce" : 1, "output" : 1 }, "ok" : 1 } { "_id" : 1, "value" : [ 10, 20, 30, 40 ] } bye </code></pre> Alternatively, one could move all the processing into the finalizer (<code>reduceF</code> is not called here since <code>mapF</code> is assumed to emit unique keys): <pre class="prettyprint"><code>use test_db db['data'].remove({}) db['data'].insert({id: 1, time_series: [10,20,40,70,110]}) var mapF = function() { emit(this.id, this.time_series); }; var reduceF = function(key, values){ }; var finalizeF = function(key, val){ var x = val; var n = x.length; var ret = []; for(var i = 0; i < n-1; i++){ ret.push( x[i+1] - x[i] ); } return ret; } db['data'].mapReduce( mapF, reduceF, { out: 'data_d1', finalize: finalizeF } ) db['data_d1'].find({}) </code></pre>

Compute first order derivative with MongoDB aggregation framework

Tags:

python

mongodb

aggregation-framework

mapreduce

pymongo

Is it possible to calculate a first order derivative using the aggregate framework?

For example, I have the data :

{time_series : [10,20,40,70,110]}

I'm trying to obtain an output like:

{derivative : [10,20,30,40]}

589

asked Aug 15 '16 15:08

user666

2 Answers

db.collection.aggregate(
    [
      {
        "$addFields": {
          "indexes": {
            "$range": [
              0,
              {
                "$size": "$time_series"
              }
            ]
          },
          "reversedSeries": {
            "$reverseArray": "$time_series"
          }
        }
      },
      {
        "$project": {
          "derivatives": {
            "$reverseArray": {
              "$slice": [
                {
                  "$map": {
                    "input": {
                      "$zip": {
                        "inputs": [
                          "$reversedSeries",
                          "$indexes"
                        ]
                      }
                    },
                    "in": {
                      "$subtract": [
                        {
                          "$arrayElemAt": [
                            "$$this",
                            0
                          ]
                        },
                        {
                          "$arrayElemAt": [
                            "$reversedSeries",
                            {
                              "$add": [
                                {
                                  "$arrayElemAt": [
                                    "$$this",
                                    1
                                  ]
                                },
                                1
                              ]
                            }
                          ]
                        }
                      ]
                    }
                  }
                },
                {
                  "$subtract": [
                    {
                      "$size": "$time_series"
                    },
                    1
                  ]
                }
              ]
            }
          },
          "time_series": 1
        }
      }
    ]
)

We can use the pipeline above in version 3.4+ to do this. In the pipeline, we use the $addFields pipeline stage. operator to add the array of the "time_series"'s elements index to do document, we also reversed the time series array and add it to the document using respectively the $range and $reverseArray operators

We reversed the array here because the element at position p in the array is always greater than the element at position p+1 which means that [p] - [p+1] < 0 and we do not want to use the $multiply here.(see pipeline for version 3.2)

Next we $zipped the time series data with the indexes array and applied a substract expression to the resulted array using the $map operator.

We then $slice the result to discard the null/None value from the array and re-reversed the result.

In 3.2 we can use the $unwind operator to unwind our array and include the index of each element in the array by specifying a document as operand instead of the traditional "path" prefixed by $.

Next in the pipeline, we need to $group our documents and use the $push accumulator operator to return an array of sub-documents that look like this:

{
    "_id" : ObjectId("57c11ddbe860bd0b5df6bc64"),
    "time_series" : [
        { "value" : 10, "index" : NumberLong(0) },
        { "value" : 20, "index" : NumberLong(1) },
        { "value" : 40, "index" : NumberLong(2) },
        { "value" : 70, "index" : NumberLong(3) },
        { "value" : 110, "index" : NumberLong(4) }
    ]
}

Finally comes the $project stage. In this stage, we need to use the $map operator to apply a series of expression to each element in the the newly computed array in the $group stage.

Here is what is going on inside the $map (see $map as a for loop) in expression:

For each subdocument, we assign the value field to a variable using the $let variable operator. We then subtract it value from the value of the "value" field of the next element in the array.

Since the next element in the array is the element at the current index plus one, all we need is the help of the $arrayElemAt operator and a simple $addition of the current element's index and 1.

The $subtract expression return a negative value so we need to multiply the value by -1 using the $multiply operator.

We also need to $filter the resulted array because it the last element is None or null. The reason is that when the current element is the last element, $subtract return None because the index of the next element equal the size of the array.

db.collection.aggregate([
  {
    "$unwind": {
      "path": "$time_series",
      "includeArrayIndex": "index"
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "time_series": {
        "$push": {
          "value": "$time_series",
          "index": "$index"
        }
      }
    }
  },
  {
    "$project": {
      "time_series": {
        "$filter": {
          "input": {
            "$map": {
              "input": "$time_series",
              "as": "el",
              "in": {
                "$multiply": [
                  {
                    "$subtract": [
                      "$$el.value",
                      {
                        "$let": {
                          "vars": {
                            "nextElement": {
                              "$arrayElemAt": [
                                "$time_series",
                                {
                                  "$add": [
                                    "$$el.index",
                                    1
                                  ]
                                }
                              ]
                            }
                          },
                          "in": "$$nextElement.value"
                        }
                      }
                    ]
                  },
                  -1
                ]
              }
            }
          },
          "as": "item",
          "cond": {
            "$gte": [
              "$$item",
              0
            ]
          }
        }
      }
    }
  }
])

Another option which I think is less efficient is perform a map/reduce operation on our collection using the map_reduce method.

>>> import pymongo
>>> from bson.code import Code
>>> client = pymongo.MongoClient()
>>> db = client.test
>>> collection = db.collection
>>> mapper = Code("""
...               function() {
...                 var derivatives = [];
...                 for (var index=1; index<this.time_series.length; index++) {
...                   derivatives.push(this.time_series[index] - this.time_series[index-1]);
...                 }
...                 emit(this._id, derivatives);
...               }
...               """)
>>> reducer = Code("""
...                function(key, value) {}
...                """)
>>> for res in collection.map_reduce(mapper, reducer, out={'inline': 1})['results']:
...     print(res)  # or do something with the document.
... 
{'value': [10.0, 20.0, 30.0, 40.0], '_id': ObjectId('57c11ddbe860bd0b5df6bc64')}

You can also retrieve all the document and use the numpy.diff to return the derivative like this:

import numpy as np


for document in collection.find({}, {'time_series': 1}):
    result = np.diff(document['time_series'])

115

answered Oct 12 '22 12:10

styvane

it's a bit dirty, but perhaps something like this?

use test_db
db['data'].remove({})
db['data'].insert({id: 1, time_series: [10,20,40,70,110]})

var mapF = function() {
    emit(this.id, this.time_series);
    emit(this.id, this.time_series);
};

var reduceF = function(key, values){
    var n = values[0].length;
    var ret = [];
    for(var i = 0; i < n-1; i++){
        ret.push( values[0][i+1] - values[0][i] );
    }
    return {'gradient': ret};
};

var finalizeF = function(key, val){
    return val.gradient;
}

db['data'].mapReduce(
    mapF,
    reduceF,
    { out: 'data_d1', finalize: finalizeF }
)

db['data_d1'].find({})

The "strategy" here is to emit the data to be operated on twice so that it is accessible in the reduce stage, return an object to avoid the message "reduce -> multiple not supported yet" and then filter back the array in the finalizer.

This script then produces:

MongoDB shell version: 3.2.9
connecting to: test
switched to db test_db
WriteResult({ "nRemoved" : 1 })
WriteResult({ "nInserted" : 1 })
{
    "result" : "data_d1",
        "timeMillis" : 13,
        "counts" : {
            "input" : 1,
            "emit" : 2,     
            "reduce" : 1,           
            "output" : 1                    
        },                                      
        "ok" : 1                                    
}                                                   
{ "_id" : 1, "value" : [ 10, 20, 30, 40 ] }         
bye

Alternatively, one could move all the processing into the finalizer (reduceF is not called here since mapF is assumed to emit unique keys):

use test_db
db['data'].remove({})
db['data'].insert({id: 1, time_series: [10,20,40,70,110]})

var mapF = function() {
    emit(this.id, this.time_series);
};

var reduceF = function(key, values){
};

var finalizeF = function(key, val){
    var x = val;
    var n = x.length;

    var ret = [];
    for(var i = 0; i < n-1; i++){
        ret.push( x[i+1] - x[i] );
    }
    return ret;
}

db['data'].mapReduce(
    mapF,
    reduceF,
    { out: 'data_d1', finalize: finalizeF }
)

db['data_d1'].find({})

answered Oct 12 '22 14:10

ewcz

Related questions
                            
                                Python: create sublist without copying
                            
                                py.test's monkeypatch.setattr(...) not working in some cases
                            
                                How do I use NLTK's default tokenizer to get spans instead of strings?
                            
                                Creating a config file for Python Program
                            
                                How to specify the `dtype` of index when read a csv file to `DataFrame`?
                            
                                Retrieve distinct values from the hash key - DynamoDB
                            
                                sklearn: How to reset a Regressor or classifier object in sknn
                            
                                Python multiprocessing pool hangs on map call
                            
                                How do define an attribute in Python 3 enum class that is NOT an enum value? [duplicate]
                            
                                Are classobjects singletons?
                            
                                Flask SQLAlchemy NOT NULL constraint failed on primary key
                            
                                Is it possible to download apk from google play programmatically to PC?
                            
                                Dynamically creating python class from a protobuf file at run time?
                            
                                Python manager.dict() is very slow compared to regular dict
                            
                                How do I search a list that is in a nested list (list of list) without loop in Python?
                            
                                Removing data between double squiggly brackets with nested sub brackets in python
                            
                                Iterate through a dictionary in reverse order (Python)
                            
                                Get a list of all private channels with Slack API
                            
                                Generate a n-dimensional array of coordinates in numpy
                            
                                Limiting execution time of embedded Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With