I have the following dataframe: <pre class="prettyprint"><code>df = pd.DataFrame.from_dict({'case': ['foo', 'foo', 'foo', 'foo', 'bar'], 'cluster': [1, 1, 1, 2, 1], 'conf': [1, 2, 3, 1, 1]}) df Out[3]: case cluster conf 0 foo 1 1 1 foo 1 2 2 foo 1 3 3 foo 2 1 4 bar 1 1 </code></pre> If I group by 'case' and 'cluster', I can remove the elements belonging to groups with only 1 element: <pre class="prettyprint"><code>df.groupby(['case', 'cluster']).filter(lambda x: len(x) > 1) Out[4]: case cluster conf 0 foo 1 1 1 foo 1 2 2 foo 1 3 </code></pre> I can also compute the mean number of elements per group for each 'case' value: <pre class="prettyprint"><code>df.groupby(['case', 'cluster']).size().mean(level='case') Out[5]: case bar 1 foo 2 dtype: int64 </code></pre> But, how can I filter out the elements belonging to groups with less elements than the corresponding mean value? The output I am expecting is: <pre class="prettyprint"><code> case cluster conf 0 foo 1 1 1 foo 1 2 2 foo 1 3 4 bar 1 1 </code></pre>

<pre class="prettyprint"><code>a = 2;b =1 pd.concat( [df[(df.conf >= a) & (df.case == 'foo')], df[(df.conf >= b) & (df.case == 'bar')] ]) case cluster conf 1 foo 1 2 2 foo 1 3 4 bar 1 1 </code></pre>

Remove groups with size smaller than mean group size in pandas

I have the following dataframe:

df = pd.DataFrame.from_dict({'case': ['foo', 'foo', 'foo', 'foo', 'bar'],
                             'cluster': [1, 1, 1, 2, 1],
                             'conf': [1, 2, 3, 1, 1]})

df
Out[3]: 
  case  cluster  conf
0  foo        1     1
1  foo        1     2
2  foo        1     3
3  foo        2     1
4  bar        1     1

If I group by 'case' and 'cluster', I can remove the elements belonging to groups with only 1 element:

df.groupby(['case', 'cluster']).filter(lambda x: len(x) > 1)
Out[4]: 
  case  cluster  conf
0  foo        1     1
1  foo        1     2
2  foo        1     3

I can also compute the mean number of elements per group for each 'case' value:

df.groupby(['case', 'cluster']).size().mean(level='case')
Out[5]: 
case
bar    1
foo    2
dtype: int64

But, how can I filter out the elements belonging to groups with less elements than the corresponding mean value? The output I am expecting is:

  case  cluster  conf
0  foo        1     1
1  foo        1     2
2  foo        1     3
4  bar        1     1

What does Group_by do in pandas?

What is the GroupBy function? Pandas' GroupBy is a powerful and versatile function in Python. It allows you to split your data into separate groups to perform computations for better analysis.

Can you sort a Groupby pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

What is group by () in pandas library?

Pandas groupby is used for grouping the data according to the categories and apply a function to the categories. It also helps to aggregate data efficiently. Pandas dataframe. groupby() function is used to split the data into groups based on some criteria.

How do I group specific rows in pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

You can use the name parameter of a group to perform a lookup on the mean group size Series while using filter:

grp_mean = df.groupby(['case', 'cluster']).size().mean(level='case')
df = df.groupby(['case', 'cluster']).filter(lambda x: len(x) >= grp_mean[x.name[0]])

As pointed out by @MaxU, this could be slightly sped up by factoring out the groupby:

g = df.groupby(['case', 'cluster'])
grp_mean = g.size().mean(level='case')
df = g.filter(lambda x: len(x) >= grp_mean[x.name[0]])

The resulting output:

  case  cluster  conf
0  foo        1     1
1  foo        1     2
2  foo        1     3
4  bar        1     1

a = 2;b =1
pd.concat( [df[(df.conf >= a) & (df.case == 'foo')], df[(df.conf >= b) & (df.case == 'bar')] ])

  case  cluster  conf
1  foo  1        2   
2  foo  1        3   
4  bar  1        1

Remove groups with size smaller than mean group size in pandas

Tags:

python

pandas

saltimbanqui

People also ask

2 Answers

root

galaxyan

Recent Activity

Donate For Us

Remove groups with size smaller than mean group size in pandas

Tags:

python

pandas

saltimbanqui

People also ask

2 Answers

root

galaxyan

Related questions

Recent Activity

Donate For Us