Computing MAD(mean absolute deviation) GroupBy Pandas

Question

I have a dataframe:

Type Name Cost
  A   X    545
  B   Y    789
  C   Z    477
  D   X    640
  C   X    435
  B   Z    335
  A   X    850
  B   Y    152

I have all such combinations in my dataframe with Type ['A','B','C','D'] and Names ['X','Y','Z'] . I used the groupby method to get stats on a specific combination together like A-X , A-Y , A-Z .Here's some code:

df = pd.DataFrame({'Type':['A','B','C','D','C','B','A','B'] ,'Name':['X','Y','Z','X','X','Z','X','Y'], 'Cost':[545,789,477,640,435,335,850,152]})
df.groupby(['Name','Type']).agg([mean,std])  
#need to use mad instead of std

I need to eliminate the observations that are more than 3 MADs away ; something like:

test = df[np.abs(df.Cost-df.Cost.mean())<=(3*df.Cost.mad())]

I am confused with this as df.Cost.mad() returns the MAD for the Cost on the entire data rather than a specific Type-Name category. How could I combine both?

Julien Spronck · Accepted Answer

You can use groupby and transform to create new data series that can be used to filter out your data.

groups = df.groupby(['Name','Type'])
mad = groups['Cost'].transform(lambda x: x.mad())
dif = groups['Cost'].transform(lambda x: np.abs(x - x.mean()))
df2 = df[dif <= 3*mad]

However, in this case, no row is filtered out since the difference is equal to the mean absolute deviation (the groups have only two rows at most).

Computing MAD(mean absolute deviation) GroupBy Pandas

Tags:

python

pandas

dataframe

group-by

aggregate

Hypothetical Ninja

1 Answers

Julien Spronck

Recent Activity

Donate For Us

Computing MAD(mean absolute deviation) GroupBy Pandas

Tags:

python

pandas

dataframe

group-by

aggregate

Hypothetical Ninja

1 Answers

Julien Spronck

Related questions

Recent Activity

Donate For Us