Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

boolean operation with groupby in pandas

I would like to use pandas.groupby in a particular way. Given a DataFrame with two boolean columns (call them col1 and col2) and an id column, I want to add a column in the following way:

for every entry, if (col2 is True) and and (col1 is True for any of the entries with the same id) then assign True. Otherwise False.

I have made a simple example:

df = pd.DataFrame([[0,1,1,2,2,3,3],[False, False, False, False, False, False, True],[False, True, False, False, True ,True, False]]).transpose()
df.columns = ['id', 'col1', 'col2']

gives the following DataFrame:

     id   col1   col2
0    0   False   False
1    1   False   True
2    1   False   False
3    2   False   False
4    2   False   True
5    3   False   True
6    3   True    False

According to the above rule, the following should column should be added:

0    False
1    False
2    False
3    False
4    False
5     True
6    False

Any ideas on an elegant way to do this?

like image 882
splinter Avatar asked Mar 22 '17 04:03

splinter


People also ask

What is a groupby operation in pandas?

A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results. After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data.

How to group data in pandas?

The abstract definition of grouping is to provide a mapping of labels to group names. Pandas datasets can be split into any of their objects. There are multiple ways to split data like: Note : In this we refer to the grouping objects as the keys. In order to group data with one key, we pass only one key as an argument in groupby function.

Do you call Dir () on a pandas groupby object?

If you call dir () on a Pandas GroupBy object, then you’ll see enough methods there to make your head spin! It can be hard to keep track of all of the functionality of a Pandas GroupBy object. One way to clear the fog is to compartmentalize the different methods into what they do and how they behave.

How to split pandas data by one key?

Pandas datasets can be split into any of their objects. There are multiple ways to split data like: Note : In this we refer to the grouping objects as the keys. In order to group data with one key, we pass only one key as an argument in groupby function. Now we group a data of Name using groupby () function.


2 Answers

df.groupby('id').col1.transform('any') & df.col2

0    False
1    False
2    False
3    False
4    False
5     True
6    False
dtype: bool
like image 178
piRSquared Avatar answered Oct 16 '22 05:10

piRSquared


This code will produce the output you requested:

df2 = df.merge(df.groupby('id')['col1'] # group on "id" and select 'col1'
                    .any()              # True if any items are True
                    .rename('cond2')    # name Series 'cond2'
                    .to_frame()         # make a dataframe for merging
                    .reset_index())     # reset_index to get id column back
print(df2.col2 & df2.cond2)             # True when 'col2' and 'cond2' are True
like image 5
Craig Avatar answered Oct 16 '22 05:10

Craig