Filtering out outliers in Pandas dataframe with rolling median

Tags:

I am trying to filter out some outliers from a scatter plot of GPS elevation displacements with dates

I'm trying to use df.rolling to compute a median and standard deviation for each window and then remove the point if it is greater than 3 standard deviations.

However, I can't figure out a way to loop through the column and compare the the median value rolling calculated.

Here is the code I have so far

import pandas as pd
import numpy as np

def median_filter(df, window):
    cnt = 0
    median = df['b'].rolling(window).median()
    std = df['b'].rolling(window).std()
    for row in df.b:
      #compare each value to its median




df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns = ['a', 'b'])

median_filter(df, 10)

How can I loop through and compare each point and remove it?

582

asked Oct 26 '17 21:10

p0ps1c1e

1 Answers

Just filter the dataframe

df['median']= df['b'].rolling(window).median()
df['std'] = df['b'].rolling(window).std()

#filter setup
df = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]

answered Oct 12 '22 18:10

DJK

Related questions
                            
                                How should I pass a matplotlib object through a function; as Axis, Axes or Figure?
                            
                                Convert freq string to DateOffset in pandas
                            
                                UndefinedVariableError when querying pandas DataFrame
                            
                                100% area plot of a pandas DataFrame
                            
                                Plot Multicolored line based on conditional in python
                            
                                np.where Not Working in my Pandas
                            
                                Pandas groupby result into multiple columns
                            
                                Pandas plotting in Windows terminal
                            
                                Excel VLOOKUP equivalent in pandas
                            
                                Pandas: replace column values based on match from another column
                            
                                pandas not condition with filtering
                            
                                Error iterating through a Pandas series
                            
                                How can I add a new computed column in a dataframe? [duplicate]
                            
                                How to get Python pandas DataFrame from string written by print()?
                            
                                Python Pandas Dataframe replace values below treshold
                            
                                Creating a Boxplot with Matplotlib
                            
                                Converting numpy array into dataframe column?
                            
                                How to permute one column in pandas
                            
                                What's the idiomatic way to perform an aggregate and rename operation in pandas
                            
                                Pandas unable to open this Excel file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filtering out outliers in Pandas dataframe with rolling median

Tags:

pandas

outliers

median

rolling-computation

p0ps1c1e

People also ask

1 Answers

DJK

Recent Activity

Donate For Us