I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in.
For missing values as NAs, I would do:
data = df.groupby(['GroupID']).column
data.transform(lambda x: x.fillna(x.mean()))
But how to do this operation on a condition like x < 0
?
Thanks!
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.
The groupby is one of the most frequently used Pandas functions in data analysis. It is used for grouping the data points (i.e. rows) based on the distinct values in the given column or columns. We can then calculate aggregated values for the generated groups.
transform. Call function producing a same-indexed DataFrame on each group. Returns a DataFrame having the same indexes as the original object filled with the transformed values.
Here's one way to do it (for the 'b'
column, in this boring example):
In [1]: df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
In [2]: df
Out[2]:
a b
0 1 1
1 1 -1
2 2 1
3 2 2
Replace those negative values with NaN, and then calculate the mean (b
) in each group:
In [3]: df['b'] = df.b.apply(lambda x: x if x>=0 else pd.np.nan)
In [4]: m = df.groupby('a').mean().b
Then use apply
across each row, to replace each NaN with its groups mean:
In [5]: df['b'] = df.apply(lambda row: m[row['a']]
if pd.isnull(row['b'])
else row['b'],
axis=1)
In [6]: df
Out[6]:
a b
0 1 1
1 1 1
2 2 1
3 2 2
Using @AndyHayden's example, you could use groupby
/transform
with replace
:
df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
print(df)
# a b
# 0 1 1
# 1 1 -1
# 2 2 1
# 3 2 2
data = df.groupby(['a'])
def replace(group):
mask = group<0
# Select those values where it is < 0, and replace
# them with the mean of the values which are not < 0.
group[mask] = group[~mask].mean()
return group
print(data.transform(replace))
# b
# 0 1
# 1 1
# 2 1
# 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With