Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the median in MongoDB aggregation framework

Is there a way to calculate the median using the MongoDB aggregation framework?

like image 491
user3080286 Avatar asked Dec 08 '13 16:12

user3080286


People also ask

How do you find the median in MongoDB?

The simplest way to compute the median would be with these two statements (assuming the attribute on which we want to compute the median is called a and we want it over all documents in the collection, coll ): count = db. coll. count(); db.

What is median aggregation?

The median aggregation returns the median value of the specified measure, grouped by the chosen dimension or dimensions. For example, median(revenue) returns the median revenue grouped by the (optional) chosen dimension.

What is aggregation framework in MongoDB?

Aggregation in MongoDB allows for the transforming of data and results in a more powerful fashion than from using the find() command. Through the use of multiple stages and expressions, you are able to build a "pipeline" of operations on your data to perform analytic operations.

What are the differences between using aggregate () and find () in MongoDB?

With aggregate + $match, you get a big monolithic BSON containing all matching documents. With find, you get a cursor to all matching documents. Then you can get each document one by one.


2 Answers

The median is somewhat tricky to compute in the general case, because it involves sorting the whole data set, or using a recursion with a depth that is also proportional to the data set size. That's maybe the reason why many databases don't have a median operator out of the box (MySQL doesn't have one, either).

The simplest way to compute the median would be with these two statements (assuming the attribute on which we want to compute the median is called a and we want it over all documents in the collection, coll):

count = db.coll.count(); db.coll.find().sort( {"a":1} ).skip(count / 2 - 1).limit(1); 

This is the equivalent to what people suggest for MySQL.

like image 174
drmirror Avatar answered Sep 21 '22 07:09

drmirror


It's possible to do it in one shot with the aggregate framework.

Sort => put in Array sorted values => get Size of array => divide size by two => get Int value of the division (left side of median) => add 1 to left side ( right side) => get array element at left side and right side => average of the two elements

This is a sample with Spring java mongoTemplate :

The model is a list of book with the login of the author ("owner"), the objective is to get the median of book by users :

        GroupOperation countByBookOwner = group("owner").count().as("nbBooks");      SortOperation sortByCount = sort(Direction.ASC, "nbBooks");      GroupOperation putInArray = group().push("nbBooks").as("nbBooksArray");      ProjectionOperation getSizeOfArray = project("nbBooksArray").and("nbBooksArray").size().as("size");      ProjectionOperation divideSizeByTwo = project("nbBooksArray").and("size").divide(2).as("middleFloat");      ProjectionOperation getIntValueOfDivisionForBornLeft = project("middleFloat", "nbBooksArray").and("middleFloat")             .project("trunc").as("beginMiddle");      ProjectionOperation add1ToBornLeftToGetBornRight = project("beginMiddle", "middleFloat", "nbBooksArray")             .and("beginMiddle").project("add", 1).as("endMiddle");      ProjectionOperation arrayElementAt = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray")             .and("nbBooksArray").project("arrayElemAt", "$beginMiddle").as("beginValue").and("nbBooksArray")             .project("arrayElemAt", "$endMiddle").as("endValue");      ProjectionOperation averageForMedian = project("beginMiddle", "endMiddle", "middleFloat", "nbBooksArray",             "beginValue", "endValue").and("beginValue").project("avg", "$endValue").as("median");      Aggregation aggregation = newAggregation(countByBookOwner, sortByCount, putInArray, getSizeOfArray,             divideSizeByTwo, getIntValueOfDivisionForBornLeft, add1ToBornLeftToGetBornRight, arrayElementAt,             averageForMedian);      long time = System.currentTimeMillis();     AggregationResults<MedianContainer> groupResults = mongoTemplate.aggregate(aggregation, "book",             MedianContainer.class); 

And here the resulting aggregation :

{ "aggregate": "book" , "pipeline": [     {         "$group": {             "_id": "$owner" ,             "nbBooks": {                 "$sum": 1             }         }     } , {         "$sort": {             "nbBooks": 1         }     } , {         "$group": {             "_id": null  ,             "nbBooksArray": {                 "$push": "$nbBooks"             }         }     } , {         "$project": {             "nbBooksArray": 1 ,             "size": {                 "$size": ["$nbBooksArray"]             }         }     } , {         "$project": {             "nbBooksArray": 1 ,             "middleFloat": {                 "$divide": ["$size" , 2]             }         }     } , {         "$project": {             "middleFloat": 1 ,             "nbBooksArray": 1 ,             "beginMiddle": {                 "$trunc": ["$middleFloat"]             }         }     } , {         "$project": {             "beginMiddle": 1 ,             "middleFloat": 1 ,             "nbBooksArray": 1 ,             "endMiddle": {                 "$add": ["$beginMiddle" , 1]             }         }     } , {         "$project": {             "beginMiddle": 1 ,             "endMiddle": 1 ,             "middleFloat": 1 ,             "nbBooksArray": 1 ,             "beginValue": {                 "$arrayElemAt": ["$nbBooksArray" , "$beginMiddle"]             } ,             "endValue": {                 "$arrayElemAt": ["$nbBooksArray" , "$endMiddle"]             }         }     } , {         "$project": {             "beginMiddle": 1 ,             "endMiddle": 1 ,             "middleFloat": 1 ,             "nbBooksArray": 1 ,             "beginValue": 1 ,             "endValue": 1 ,             "median": {                 "$avg": ["$beginValue" , "$endValue"]             }         }     } ] 

}

like image 30
maxiplay Avatar answered Sep 24 '22 07:09

maxiplay