Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding and removing outliers in PHP

Tags:

algorithm

php

Suppose I sample a selection of database records that return the following numbers:

20.50, 80.30, 70.95, 15.25, 99.97, 85.56, 69.77

Is there an algorithm that can be efficiently implemented in PHP to find the outliers (if there are any) from an array of floats based on how far they deviate from the mean?

like image 589
eComEvo Avatar asked Mar 02 '13 13:03

eComEvo


2 Answers

Ok let's assume you have your data points in an array like so:

<?php $dataset = array(20.50, 80.30, 70.95, 15.25, 99.97, 85.56, 69.77); ?>

Then you can use the following function (see comments for what is happening) to remove all numbers that fall outside of the mean +/- the standard deviation times a magnitude you set (defaults to 1):

<?php

function remove_outliers($dataset, $magnitude = 1) {

  $count = count($dataset);
  $mean = array_sum($dataset) / $count; // Calculate the mean
  $deviation = sqrt(array_sum(array_map("sd_square", $dataset, array_fill(0, $count, $mean))) / $count) * $magnitude; // Calculate standard deviation and times by magnitude

  return array_filter($dataset, function($x) use ($mean, $deviation) { return ($x <= $mean + $deviation && $x >= $mean - $deviation); }); // Return filtered array of values that lie within $mean +- $deviation.
}

function sd_square($x, $mean) {
  return pow($x - $mean, 2);
} 

?>

For your example this function returns the following with a magnitude of 1:

Array
(
    [1] => 80.3
    [2] => 70.95
    [5] => 85.56
    [6] => 69.77
)
like image 64
George Reith Avatar answered Oct 29 '22 23:10

George Reith


For a normally distributed set of data, removes values more than 3 standard deviations from the mean.

<?php
function remove_outliers($array) {
    if(count($array) == 0) {
      return $array;
    }
    $ret = array();
    $mean = array_sum($array)/count($array);
    $stddev = stats_standard_deviation($array);
    $outlier = 3 * $stddev;
    foreach($array as $a) {
        if(!abs($a - $mean) > $outlier) {
            $ret[] = $a;
        }
    }
    return $ret;
}
like image 40
Philip Whitehouse Avatar answered Oct 29 '22 23:10

Philip Whitehouse