Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers).

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. Isn't there a smarter way?

We use Perl, but solutions for any language might be helpful.

like image 874
Ask Bjørn Hansen Avatar asked Sep 29 '09 07:09

Ask Bjørn Hansen


People also ask

What happens to the mean and median as the sample size increases?

The larger the population sample (number of scores) the closer mean and median become. In fact, in a perfect bell curve, the mean and median are identical. Standard deviation. Standard deviation (SD) is a widely used measurement of variability used in statistics. It shows how much variation there is from the average (mean).

How do you find the median value in an ordered data set?

Median is mid value in this ordered data set. Arrange the data in the increasing order and then find the mid value. If we have even number of values in the data set then median is sum of mid two numbers divided by 2 In we have odd number in the data set like below we have 9 heights the median will be 5th number value.

What is median mean and standard deviation?

Mean and standard deviation The median is known as a measure of location; that is, it tells us where the data are. As stated in , we do not need to know all the exact values to calculate the median; if we made the smallest value even smaller or the largest value even larger, it would not change the value of the median.

What is the standard deviation of 10 samples of data?

Depends on the 10 samples of data. If all ten numbers were 29.05 then the standard deviation would be zero. Standard deviation is a measure of how much the data deviates from the mean.


1 Answers

See the article Calculating Percentiles in Memory-bound Applications. It explains how to calculate median and other percentiles efficiently.

Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance.

like image 110
John D. Cook Avatar answered Oct 03 '22 09:10

John D. Cook