Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select first N number of groups based on values of a column conditionally?

This is my DataFrame:

import pandas as pd 
df = pd.DataFrame(
    {
        'a': [10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 10, 22],
        'b': [1, 1, 1, -1, -1, -1, -1, 2, 2, 2, 2, -1, -1, -1, -1],
        'c': [25, 25, 25, 45, 45, 45, 45, 65, 65, 65, 65, 40, 40, 30, 30]
    }
)

The expected output: Grouping df by c and a condition:

     a  b   c
0   10  1  25
1   15  1  25
2   20  1  25
3   25 -1  45
4   30 -1  45
5   35 -1  45
6   40 -1  45
11  65 -1  40
12  70 -1  40

The process is as follows:

a) Selecting the group that all of the b values is 1. In my data and this df there is only one group with this condition.

b) Selecting first two groups (from top of df) that all of their b values are -1.

For example:

a) Group 25 is selected.

b) There are three groups with this condition. First two groups are: Group 45 and 40.

Note that there is a possibility in my data that there are no groups that has a or b condition. If that is the case, returning whatever matches the criteria is fine. For example the output could be only one group or no groups at all.

The groups that I want are shown below:

enter image description here

These are my attempts that got very close:

df1 = df.groupby('c').filter(lambda g: g.b.eq(1).all())
gb = df.groupby('c')
new_gb = pd.concat([gb.get_group(group) for i, group in enumerate(gb.groups) if i < 2])
like image 708
Amir Avatar asked Sep 18 '25 11:09

Amir


1 Answers

You can use custom masks for boolean indexing:

# identify groups with all 1
m1 = df['b'].eq(1).groupby(df['c']).transform('all')
# identify groups with all -1
m2 = df['b'].eq(-1).groupby(df['c']).transform('all')
# keep rows of first 2 groups with all -1
m3 = df['c'].isin(df.loc[m2, 'c'].unique()[:2])

# select m1 OR m3
out = df[m1 | m3]

Or, for a variant without groupby, using set operations:

# identify rows with 1/-1
m1 = df['b'].eq(1)
m2 = df['b'].eq(-1)

# drop c that have values other that 1/-1:  {65}
# drop -1 groups after 2nd occurrence:      {30}
drop = set(df.loc[~(m1|m2), 'c']) | set(df.loc[m2, 'c'].unique()[2:])

out = df[~df['c'].isin(drop)]

Output:

     a  b   c
0   10  1  25
1   15  1  25
2   20  1  25
3   25 -1  45
4   30 -1  45
5   35 -1  45
6   40 -1  45
11  65 -1  40
12  70 -1  40

Intermediates (first approach):

     a  b   c     m1     m2     m3
0   10  1  25   True  False  False
1   15  1  25   True  False  False
2   20  1  25   True  False  False
3   25 -1  45  False   True   True
4   30 -1  45  False   True   True
5   35 -1  45  False   True   True
6   40 -1  45  False   True   True
7   45  2  65  False  False  False
8   50  2  65  False  False  False
9   55  2  65  False  False  False
10  60  2  65  False  False  False
11  65 -1  40  False   True   True
12  70 -1  40  False   True   True
13  10 -1  30  False   True  False
14  22 -1  30  False   True  False

Intermediates (second approach):

     a  b   c     m1     m2  ~isin(drop)
0   10  1  25   True  False         True
1   15  1  25   True  False         True
2   20  1  25   True  False         True
3   25 -1  45  False   True         True
4   30 -1  45  False   True         True
5   35 -1  45  False   True         True
6   40 -1  45  False   True         True
7   45  2  65  False  False        False
8   50  2  65  False  False        False
9   55  2  65  False  False        False
10  60  2  65  False  False        False
11  65 -1  40  False   True         True
12  70 -1  40  False   True         True
13  10 -1  30  False   True        False
14  22 -1  30  False   True        False
like image 173
mozway Avatar answered Sep 21 '25 03:09

mozway