Is there a way to calculate the median using the MongoDB aggregation framework?
The simplest way to compute the median would be with these two statements (assuming the attribute on which we want to compute the median is called a and we want it over all documents in the collection, coll ): count = db. coll. count(); db.
The median aggregation returns the median value of the specified measure, grouped by the chosen dimension or dimensions. For example, median(revenue) returns the median revenue grouped by the (optional) chosen dimension.
Aggregation in MongoDB allows for the transforming of data and results in a more powerful fashion than from using the find() command. Through the use of multiple stages and expressions, you are able to build a "pipeline" of operations on your data to perform analytic operations.
With aggregate + $match, you get a big monolithic BSON containing all matching documents. With find, you get a cursor to all matching documents. Then you can get each document one by one.
The median is somewhat tricky to compute in the general case, because it involves sorting the whole data set, or using a recursion with a depth that is also proportional to the data set size. That's maybe the reason why many databases don't have a median operator out of the box (MySQL doesn't have one, either).
The simplest way to compute the median would be with these two statements (assuming the attribute on which we want to compute the median is called a
and we want it over all documents in the collection, coll
):
count = db.coll.count(); db.coll.find().sort( {"a":1} ).skip(count / 2 - 1).limit(1);
This is the equivalent to what people suggest for MySQL.
It's possible to do it in one shot with the aggregate framework.
Sort => put in Array sorted values => get Size of array => divide size by two => get Int value of the division (left side of median) => add 1 to left side ( right side) => get array element at left side and right side => average of the two elements
This is a sample with Spring java mongoTemplate :
The model is a list of book with the login of the author ("owner"), the objective is to get the median of book by users :
GroupOperation countByBookOwner = group("owner").count().as("nbBooks"); SortOperation sortByCount = sort(Direction.ASC, "nbBooks"); GroupOperation putInArray = group().push("nbBooks").as("nbBooksArray"); ProjectionOperation getSizeOfArray = project("nbBooksArray").and("nbBooksArray").size().as("size"); ProjectionOperation divideSizeByTwo = project("nbBooksArray").and("size").divide(2).as("middleFloat"); ProjectionOperation getIntValueOfDivisionForBornLeft = project("middleFloat", "nbBooksArray").and("middleFloat") .project("trunc").as("beginMiddle"); ProjectionOperation add1ToBornLeftToGetBornRight = project("beginMiddle", "middleFloat", "nbBooksArray") .and("beginMiddle").project("add", 1).as("endMiddle"); ProjectionOperation arrayElementAt = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray") .and("nbBooksArray").project("arrayElemAt", "$beginMiddle").as("beginValue").and("nbBooksArray") .project("arrayElemAt", "$endMiddle").as("endValue"); ProjectionOperation averageForMedian = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray", "beginValue", "endValue").and("beginValue").project("avg", "$endValue").as("median"); Aggregation aggregation = newAggregation(countByBookOwner, sortByCount, putInArray, getSizeOfArray, divideSizeByTwo, getIntValueOfDivisionForBornLeft, add1ToBornLeftToGetBornRight, arrayElementAt, averageForMedian); long time = System.currentTimeMillis(); AggregationResults<MedianContainer> groupResults = mongoTemplate.aggregate(aggregation, "book", MedianContainer.class);
And here the resulting aggregation :
{ "aggregate": "book" , "pipeline": [ { "$group": { "_id": "$owner" , "nbBooks": { "$sum": 1 } } } , { "$sort": { "nbBooks": 1 } } , { "$group": { "_id": null , "nbBooksArray": { "$push": "$nbBooks" } } } , { "$project": { "nbBooksArray": 1 , "size": { "$size": ["$nbBooksArray"] } } } , { "$project": { "nbBooksArray": 1 , "middleFloat": { "$divide": ["$size" , 2] } } } , { "$project": { "middleFloat": 1 , "nbBooksArray": 1 , "beginMiddle": { "$trunc": ["$middleFloat"] } } } , { "$project": { "beginMiddle": 1 , "middleFloat": 1 , "nbBooksArray": 1 , "endMiddle": { "$add": ["$beginMiddle" , 1] } } } , { "$project": { "beginMiddle": 1 , "endMiddle": 1 , "middleFloat": 1 , "nbBooksArray": 1 , "beginValue": { "$arrayElemAt": ["$nbBooksArray" , "$beginMiddle"] } , "endValue": { "$arrayElemAt": ["$nbBooksArray" , "$endMiddle"] } } } , { "$project": { "beginMiddle": 1 , "endMiddle": 1 , "middleFloat": 1 , "nbBooksArray": 1 , "beginValue": 1 , "endValue": 1 , "median": { "$avg": ["$beginValue" , "$endValue"] } } } ]
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With