Having some issues working out the best way to do this in MongoDB, arguably its a relation data set so I will probably be slated. Still its a challenge to see if its possible.
I currently need to order by a Logistics Managers' daily average miles across the vans in their department and also in a separate list a combined weekly average.
Mr First setup in the database was as follows
{
"_id" : ObjectId("555cf04fa3ed8cc2347b23d7"),
"name" : "My Manager 1",
"vans" : [
{
"name" : "van1",
"miles" : NumberLong(56)
},
{
"name" : "van2",
"miles" : NumberLong(34)
}
]
}
But I can't see how to order by a nested array value without knowing the parent array keys (these will be standard 0-x)
So my next choice was to scrap that idea just have the name in the first collection and the vans in the second collection with Id of the manager.
So removing vans from the above example and adding this collection (vans)
{
"_id" : ObjectId("555cf04fa3ed8cc2347b23d9"),
"name" : "van1",
"miles" : NumberLong(56),
"manager_id" : "555cf04fa3ed8cc2347b23d7"
}
But because I need show the results by manager, how do I order in a query (if possible) the average miles in this collection where id=x and then display the manager by his id.
Thanks for your help
If the Manager
is going to have limited number of Van
s, then your first approach is better, as you do not have to make two separate calls/queries to the database to collect your information.
Then comes the question how to calculate the average milage per Manager
, where the Aggregation Framework
will help you a lot. Here is a query that will get you the desired data:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id:
{
_id: "$_id",
name: "$name"
},
avg_milage: {$avg: "$vans.miles"}
}
},
{$sort: {"avg_milage": -1}},
{$project:
{_id: "$_id._id",
name: "$_id.name",
avg_milage: "$avg_milage"
}
}
])
The first $unwind
step simply unwraps the vans
array, and creates a separate documents for each element of the array.
Then the $group
stage gets all documents with the same (_id, name)
pair, and in the avg_milage
field, counts the average value of miles
field out of those documents.
The $sort
stage is obvious, it just sorts the documents in the descending order, using the new avg_milage
field as the sort key.
And finally, the last $project
step just cleans up the documents by making appropriate projections, just for beauty :)
A similar thing is needed for your second desired result:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id:
{
_id: "$_id",
name: "$name"
},
total_milage: {$sum: "$vans.miles"}
}
},
{$sort: {"total_milage": -1}},
{$project:
{_id: "$_id._id",
name: "$_id.name",
weekly_milage: {
$multiply: [
"$total_milage",
7
]
}
}
}
])
This will produce the list of Managers
with their weekly milage, sorted in descending order. So you can $limit
the result, and get the Manager
with the highest milage for instance.
And in pretty much similar way, you can grab info for your vans:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id: "$vans.name",
total_milage: {$sum: "$vans.miles"}
}
},
{$sort: {"total_milage": -1}},
{$project:
{van_name: "$_id",
weekly_milage: {
$multiply: [
"$total_milage",
7
]
}
}
}
])
First, do you require average miles for a single day, average miles over a given time period, or average miles over the life of the manager? I would consider adding a timestamp field. Yes, _id has a timestamp, but this only reflects the time the document was created, not necessarily the time of the initial day's log.
Considerations for the first data model:
Considerations for the second data model:
As @n9code has pointed out, the aggregation framework is the answer in both cases.
For the first data model, assuming each document represents one day and you want to retrieve an average for a given day or a range of days:
db.collection.aggregate([
{ $match: {
name: 'My Manager 1',
timestamp: { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $unwind: '$vans' },
{ $group: {
_id: {
_id: '$_id',
name: '$name',
timestamp: '$timestamp'
},
avg_mileage: {
$avg: '$miles'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
_id: '$_id._id',
name: '$_id.name',
timestamp: '$_id.timestamp',
avg_mileage: 1
} }
]);
If, for the first data model, each document represents a manager and the "vans" array grows daily, this particular data model is not ideal for two reasons:
For the sake of completeness, here is the query:
/*
Assuming data model is:
{
_id: ...,
name: ...,
vans: [
{ name: ..., miles: ..., timestamp: ... }
]
}
*/
db.collection.aggregate([
{ $match: {
name: 'My Manager 1'
} },
{ $unwind: '$vans' },
{ $match: {
'vans.timestamp': { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $group: {
_id: {
_id: '$_id',
name: '$name'
},
avg_mileage: {
$avg: '$miles'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
_id: '$_id._id',
name: '$_id.name',
avg_mileage: 1
} }
]);
For the second data model, aggregation is more straightforward. I'm assuming the inclusion of a timestamp:
db.collection.aggregate([
{ $match: {
manager_id: ObjectId('555cf04fa3ed8cc2347b23d7')
timestamp: { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $group: {
_id: '$manager_id'
},
avg_mileage: {
$avg: '$miles'
}
names: {
$addToSet: '$name'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
manager_id: '$_id',
avg_mileage: 1
names: 1
} }
]);
I have added an array of names (vehicles?) used during the average computation.
Relevant documentation:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With