Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to remove extreme outliers in array

I've got an array which I use for the x-axis in a D3 graph, and it blows up because the chart size is too small for the size of the array. I had a look at data and there are extreme outliers in the data. See chart below. chart

The data around 0 (its not totally zero, its 0.00972 etc).

The data starts getting interesting around 70, then massive spikes about 100. the data then continues and then the same sort of thing on the other side about 200.

Can anyone help me with some algo that removes the extreme outliers? e.g. give me 95% or 90% percentiles and remove the contiguous elements (e.g. not just one element from the middle but x number of elements from the start of the array and the end of the array, where x depends on working out where best to do it based on the data? In Javascript as well please!

thanks!

ps you'll need to save the image to view it properly

like image 857
JML Avatar asked Dec 07 '22 02:12

JML


1 Answers

Assuming the data is like

var data[] = {0.00972, 70, 70, ...};

first sort

data.sort(function(a,b){return a-b});

then take off the bottom 2.5% and top 2.5%

var l = data.length;
var low = Math.round(l * 0.025);
var high = l - low;
var data2 = data.slice(low,high);

An alternative would be to only show data within 3 standard deviations of the mean. If you data is normally distributed 99.7% will fall in this range.

var sum=0;     // stores sum of elements
var sumsq = 0; // stores sum of squares
for(var i=0;i<data.length;++i) {
    sum+=data[i];
    sumsq+=data[i]*data[i];
}
var mean = sum/l; 
var varience = sumsq / l - mean*mean;
var sd = Math.sqrt(varience);
var data3 = new Array(); // uses for data which is 3 standard deviations from the mean
for(var i=0;i<data.length;++i) {
    if(data[i]> mean - 3 *sd && data[i] < mean + 3 *sd)
        data3.push(data[i]);
}

Or similar using some multiple of the Inter-quartile range

var median = data[Math.round(l/2)];
var LQ = data[Math.round(l/4)];
var UQ = data[Math.round(3*l/4)];
var IQR = UQ-LQ;
var data4 = new Array();
for(var i=0;i<data.length;++i) {
    if(data[i]> median - 2 * IQR && data[i] < mean + 2 * IQR)
        data4.push(data[i]);
}
like image 110
Salix alba Avatar answered Dec 21 '22 16:12

Salix alba