Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb aggregation multiple unknown nested keys

I have a document called 'InventoryPerDay' that contain inventory for stores for each day:

{
  _id: "20131202/store_a",
  _metadata: {
    date: ISODate("2013-12-02T00:00:00Z"),
    store: "store_a"
  },
  inventory: {
    quantity: {
      item_44: 1350,
      item_32: 1,
      item_2: 1,
      item_9: 1
    }
  }
},
{
  _id: "20131201/store_a",
  _metadata: {
    date: ISODate("2013-12-01T00:00:00Z"),
    store: "store_a"
  },
  inventory: {
    quantity: {
      item_44: 1000,
      item_32: 5,
      item_2: 10
    }
  }
}

I need the total quantity of each item in store_a for both of the days. The items in the 'quantity' hash are unknown. You can see that 'item_9' exists for 02/12/2013 but not for 01/12/2013.

How can i sum unknown nested keys in multiple documents using aggregation in mongodb?

The result for the above example should be:

{
  store: "store_a",
  inventory: {
    quantity: {
      item_44: 2350,
      item_32: 6,
      item_2: 11,
      item_9: 1
    }
  }
}
like image 512
refaelos Avatar asked Dec 12 '13 10:12

refaelos


1 Answers

Unfortunately what you are seeking to do is not possible with the current capabilities of Mongodb (not with aggregation at least), and if you keep your current schema you will have to use map-reduce to do this which will be much slower.

You can refer here for an explanation, why your schema is not optimal.

You can refer here to understand how you should make your schema and how your aggregation should look like.

Also what you are asking would be possible with your current schema if you could project the names of the fields to values so maybe you can vote for this ticket so it gets more attention.


Update

You need to change your schema as such

{
  _id: "20131202/store_a",
  _metadata: {
    date: ISODate("2013-12-02T00:00:00Z"),
    store: "store_a"
  },
  inventory: {
    quantities: [
      { k : "item_44", v: 1350},
      { k : "item_32", v: 1},
      { k : "item_2", v: 1},
      { k : "item_9", v: 1},
    ]
  }
},
.
.
.

and the query should look like this

db.InventoryPerDay.aggregate(
 [
    {
        "$unwind" : "$inventory.quantities"
    },
    {
        "$group" : {
            "_id" : { "store": "$_metadata.store", "item" : "$inventory.quantities.k"},
            "total" : {
                "$sum" : "$inventory.quantities.v"
            }
        }
    }
])

This will give you a result like this

{
  result :
   [
    { "_id" : { store : "store_a", item : "item_44"}
      "total" : 2350
    },
    { "_id" : { store : "store_a", item : "item_32"}
      "total" : 6
    },
    { "_id" : { store : "store_a", item : "item_2"}
      "total" : 11
    },
    { "_id" : { store : "store_a", item : "item_9"}
      "total" : 1
    }
  ]
}

You can format these results with the $project operator at the end of the aggregation pipeline.

And regarding the map-reduce vs aggregation: Map reduce is much slower than aggregation and the main reason is because it is executed on a single thread. You can vote for this ticket so they implement it for multiple cores, in most cases, it takes orders of magnitude more time to compute with map-reduce vs aggregation.

like image 200
Makis Tsantekidis Avatar answered Nov 10 '22 17:11

Makis Tsantekidis