Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the percentile?

I have access logs such as below stored in a mongodb instance:

Time                           Service                      Latency
[27/08/2013:11:19:22 +0000]    "POST Service A HTTP/1.1"    403
[27/08/2013:11:19:24 +0000]    "POST Service B HTTP/1.1"    1022 
[27/08/2013:11:22:10 +0000]    "POST Service A HTTP/1.1"    455 

Is there an analytics function like PERCENTILE_DISC in Oracle to calculate the percentile?

I would like to calculate latency percentiles over a period of time.

like image 282
user2574093 Avatar asked Aug 28 '13 09:08

user2574093


People also ask

What is the formula of a percentile?

Percentiles can be calculated using the formula n = (P/100) x N, where P = percentile, N = number of values in a data set (sorted from smallest to largest), and n = ordinal rank of a given value. Percentiles are frequently used to understand test scores and biometric measurements.

What is the 95% percentile?

The 95th percentile is a number that is greater than 95% of the numbers in a given set. The reason this statistic is so useful in measuring data throughput is that it gives a very accurate picture of the maximum traffic generated on an interface. This is a standard measure used in interpreting performance data.


1 Answers

Starting Mongo 4.4, the $group stage has a new aggregation operator $accumulator allowing custom accumulations of documents as they get grouped, via javascript user defined functions.

Thus, in order to find the 20th percentile:

// { "a" : 25, "b" : 12 }
// { "a" : 89, "b" : 73 }
// { "a" : 25, "b" : 7  }
// { "a" : 25, "b" : 17 }
// { "a" : 89, "b" : 14 }
// { "a" : 89, "b" : 17 }
// { "a" : 25, "b" : 24 }
// { "a" : 25, "b" : 15 }
// { "a" : 25, "b" : 22 }
// { "a" : 25, "b" : 94 }
db.collection.aggregate([
  { $group: {
    _id: "$a",
    percentile: {
      $accumulator: {
        accumulateArgs: ["$b"],
        init: function() { return []; },
        accumulate: function(bs, b) { return bs.concat(b); },
        merge: function(bs1, bs2) { return bs1.concat(bs2); },
        finalize: function(bs) {
          bs.sort(function(a, b) { return a - b });
          return bs[Math.floor(bs.length*.2) + 1];
        },
        lang: "js"
      }
    }
  }}
])
// { "_id" : 89, "percentile" : 17 }
// { "_id" : 25, "percentile" : 15 }

The accumulator:

  • accumulates on the field b (accumulateArgs)
  • is initialised to an empty array (init)
  • accumulates b items in an array (accumulate and merge)
  • and finally performs the percentile calculation on b items (finalize)
like image 185
Xavier Guihot Avatar answered Sep 17 '22 18:09

Xavier Guihot