Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting subset of pandas groupby dataframe where more than one key has values

I have a dataframe of values that looked like this:

    mode   journey  stage
0    BUS   1        1
1    RTS   1        2
2    BUS   2        1
3    RTS   3        1
4    BUS   3        2
5    BUS   4        1

I did a groupby ['journey','mode'] and obtained this:

g=df.groupby(['journey','mode'])  
g.size()

journey  mode
1        BUS    1
         RTS    1
2        BUS    1
3        RTS    1
         BUS    1
4        BUS    1

I would want to identify journeys that involved both BUS and RTS, so in this case, the result should return journey 1 and 3. May I know how this can be done? Thanks.

like image 328
user2798444 Avatar asked Nov 22 '25 03:11

user2798444


1 Answers

You can group by journey alone and then check whether each group contains more than one stage:

>>> df.groupby('journey')['mode'].apply(lambda g: len(g)>1)
journey
1           True
2          False
3           True
4          False
dtype: bool

If your data contains other modes and you specifically need to check for ones that involve "BUS" and "RTS", you can do a more explicit check:

>>> d.groupby('journey')['mode'].apply(lambda g: 'BUS' in g.values and 'RTS' in g.values)
journey
1           True
2          False
3           True
4          False
dtype: bool

The basic idea, though, is that if you want to find "journeys that involved such-and-such", you should group only by journey, and then compute for each group whether it matches the such-and-such condition.

like image 145
BrenBarn Avatar answered Nov 24 '25 20:11

BrenBarn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!