Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out outliers in Pandas dataframe with rolling median

I am trying to filter out some outliers from a scatter plot of GPS elevation displacements with dates

I'm trying to use df.rolling to compute a median and standard deviation for each window and then remove the point if it is greater than 3 standard deviations.

However, I can't figure out a way to loop through the column and compare the the median value rolling calculated.

Here is the code I have so far

import pandas as pd
import numpy as np

def median_filter(df, window):
    cnt = 0
    median = df['b'].rolling(window).median()
    std = df['b'].rolling(window).std()
    for row in df.b:
      #compare each value to its median




df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns = ['a', 'b'])

median_filter(df, 10)

How can I loop through and compare each point and remove it?

like image 582
p0ps1c1e Avatar asked Oct 26 '17 21:10

p0ps1c1e


People also ask

How do you remove outliers from a DataFrame in Python?

Removing the outliersInplace =True is used to tell python to make the required change in the original dataset. row_index can be only one value or list of values or NumPy array but it must be one dimensional. Full Code: Detecting the outliers using IQR and removing them.

How do you cap an outlier in pandas?

In this method, we first initialize a dataframe/series. Then, we set the values of a lower and higher percentile. We use quantile() to return values at the given quantile within the specified range. Then, we cap the values in series below and above the threshold according to the percentile values.

What is a rolling median?

A rolling median is the median of a certain number of previous periods in a time series.


1 Answers

Just filter the dataframe

df['median']= df['b'].rolling(window).median()
df['std'] = df['b'].rolling(window).std()

#filter setup
df = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]
like image 86
DJK Avatar answered Oct 12 '22 18:10

DJK