I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in. For missing values as NAs, I would do: <pre class="prettyprint"><code>data = df.groupby(['GroupID']).column data.transform(lambda x: x.fillna(x.mean())) </code></pre> But how to do this operation on a condition like <code>x < 0</code>? Thanks!

Here's one way to do it (for the <code>'b'</code> column, in this boring example): <pre class="prettyprint"><code>In [1]: df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab')) In [2]: df Out[2]: a b 0 1 1 1 1 -1 2 2 1 3 2 2 </code></pre> Replace those negative values with NaN, and then calculate the mean (<code>b</code>) in each group: <pre class="prettyprint"><code>In [3]: df['b'] = df.b.apply(lambda x: x if x>=0 else pd.np.nan) In [4]: m = df.groupby('a').mean().b </code></pre> Then use <code>apply</code> across each row, to replace each NaN with its groups mean: <pre class="prettyprint"><code>In [5]: df['b'] = df.apply(lambda row: m[row['a']] if pd.isnull(row['b']) else row['b'], axis=1) In [6]: df Out[6]: a b 0 1 1 1 1 1 2 2 1 3 2 2 </code></pre>

Replacing values with groupby means

Q: What does groupby mean and what is it used for?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

Q: What does groupby mean in pandas?

The groupby is one of the most frequently used Pandas functions in data analysis. It is used for grouping the data points (i.e. rows) based on the distinct values in the given column or columns. We can then calculate aggregated values for the generated groups.

Q: What does groupby transform do?

transform. Call function producing a same-indexed DataFrame on each group. Returns a DataFrame having the same indexes as the original object filled with the transformed values.

I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in.

For missing values as NAs, I would do:

data = df.groupby(['GroupID']).column
data.transform(lambda x: x.fillna(x.mean()))

But how to do this operation on a condition like x < 0?

Thanks!

What does groupby mean and what is it used for?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups. Used to determine the groups for the groupby.

What does groupby mean in pandas?

The groupby is one of the most frequently used Pandas functions in data analysis. It is used for grouping the data points (i.e. rows) based on the distinct values in the given column or columns. We can then calculate aggregated values for the generated groups.

What does groupby transform do?

transform. Call function producing a same-indexed DataFrame on each group. Returns a DataFrame having the same indexes as the original object filled with the transformed values.

Here's one way to do it (for the 'b' column, in this boring example):

In [1]: df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
In [2]: df
Out[2]: 
   a  b
0  1  1
1  1 -1
2  2  1
3  2  2

Replace those negative values with NaN, and then calculate the mean (b) in each group:

In [3]: df['b'] = df.b.apply(lambda x: x if x>=0 else pd.np.nan)
In [4]: m = df.groupby('a').mean().b

Then use apply across each row, to replace each NaN with its groups mean:

In [5]: df['b'] = df.apply(lambda row: m[row['a']]
                                       if pd.isnull(row['b'])
                                       else row['b'],
                           axis=1) 
In [6]: df
Out[6]: 
   a  b
0  1  1
1  1  1
2  2  1
3  2  2

Using @AndyHayden's example, you could use groupby/transform with replace:

df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
print(df)
#    a  b
# 0  1  1
# 1  1 -1
# 2  2  1
# 3  2  2

data = df.groupby(['a'])
def replace(group):
    mask = group<0
    # Select those values where it is < 0, and replace
    # them with the mean of the values which are not < 0.
    group[mask] = group[~mask].mean()
    return group
print(data.transform(replace))
#    b
# 0  1
# 1  1
# 2  1
# 3  2

Replacing values with groupby means

Tags:

python

pandas

pandas-groupby

Def_Os

People also ask

2 Answers

Andy Hayden

unutbu

Recent Activity

Donate For Us

Replacing values with groupby means

Tags:

python

pandas

pandas-groupby

Def_Os

People also ask

2 Answers

Andy Hayden

unutbu

Related questions

Recent Activity

Donate For Us