Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript: remove outlier from an array?

values = [8160,8160,6160,22684,0,0,60720,1380,1380,57128]

how can I remove outliers like 0, 57218, 60720 and 22684?

Is there a library which can do this?

like image 589
javastudent Avatar asked Dec 28 '13 04:12

javastudent


3 Answers

This all depends on your interpretation of what an "outlier" is. A common approach:

  • High outliers are anything beyond the 3rd quartile + 1.5 * the inter-quartile range (IQR)
  • Low outliers are anything beneath the 1st quartile - 1.5 * IQR

This is also the approach described by Wolfram's Mathworld.

This is easily wrapped up in a function :) I've tried to write the below clearly; obvious refactoring opportunities do exist. Note that your given sample contains no outlying values using this common approach.

function filterOutliers(someArray) {  

    // Copy the values, rather than operating on references to existing values
    var values = someArray.concat();

    // Then sort
    values.sort( function(a, b) {
            return a - b;
         });

    /* Then find a generous IQR. This is generous because if (values.length / 4) 
     * is not an int, then really you should average the two elements on either 
     * side to find q1.
     */     
    var q1 = values[Math.floor((values.length / 4))];
    // Likewise for q3. 
    var q3 = values[Math.ceil((values.length * (3 / 4)))];
    var iqr = q3 - q1;

    // Then find min and max values
    var maxValue = q3 + iqr*1.5;
    var minValue = q1 - iqr*1.5;

    // Then filter anything beyond or beneath these values.
    var filteredValues = values.filter(function(x) {
        return (x <= maxValue) && (x >= minValue);
    });

    // Then return
    return filteredValues;
}
like image 106
James Peterson Avatar answered Sep 22 '22 05:09

James Peterson


This is an improved version of @james-peterson solution that updates the syntax to the current Javascript standard and adds a more robust way of finding the two quartiles (implemented according to formulas at https://de.wikipedia.org/wiki/Interquartilsabstand_(Deskriptive_Statistik) ). It uses a faster way of copying the array (see http://jsben.ch/wQ9RU for a performance comparison) and still works for q1 = q3.

function filterOutliers(someArray) {

  if(someArray.length < 4)
    return someArray;

  let values, q1, q3, iqr, maxValue, minValue;

  values = someArray.slice().sort( (a, b) => a - b);//copy array fast and sort

  if((values.length / 4) % 1 === 0){//find quartiles
    q1 = 1/2 * (values[(values.length / 4)] + values[(values.length / 4) + 1]);
    q3 = 1/2 * (values[(values.length * (3 / 4))] + values[(values.length * (3 / 4)) + 1]);
  } else {
    q1 = values[Math.floor(values.length / 4 + 1)];
    q3 = values[Math.ceil(values.length * (3 / 4) + 1)];
  }

  iqr = q3 - q1;
  maxValue = q3 + iqr * 1.5;
  minValue = q1 - iqr * 1.5;

  return values.filter((x) => (x >= minValue) && (x <= maxValue));
}

See this gist: https://gist.github.com/rmeissn/f5b42fb3e1386a46f60304a57b6d215a

like image 38
Roy Avatar answered Sep 23 '22 05:09

Roy


I had some problems with the other two solutions. Problems like having NaN values as q1 and q3 because of wrong indexes. The array length needs to have an -1 because of the 0 index. Then it is checked if the index is a int or decimal, in the case of a decimal the value between two indexes is extracted.

function filterOutliers (someArray) {
    if (someArray.length < 4) {
        return someArray;
    }

    let values = someArray.slice().sort((a, b) => a - b); // copy array fast and sort

    let q1 = getQuantile(values, 25);
    let q3 = getQuantile(values, 75);

    let iqr, maxValue, minValue;
    iqr = q3 - q1;
    maxValue = q3 + iqr * 1.5;
    minValue = q1 - iqr * 1.5;

    return values.filter((x) => (x >= minValue) && (x <= maxValue));
}

function getQuantile (array, quantile) {
    // Get the index the quantile is at.
    let index = quantile / 100.0 * (array.length - 1);

    // Check if it has decimal places.
    if (index % 1 === 0) {
        return array[index];
    } else {
        // Get the lower index.
        let lowerIndex = Math.floor(index);
        // Get the remaining.
        let remainder = index - lowerIndex;
        // Add the remaining to the lowerindex value.
        return array[lowerIndex] + remainder * (array[lowerIndex + 1] - array[lowerIndex]);
    }
}
like image 29
A. van Hugten Avatar answered Sep 25 '22 05:09

A. van Hugten