I would like to use pandas.groupby
in a particular way. Given a DataFrame with two boolean columns (call them col1
and col2
) and an id column, I want to add a column in the following way:
for every entry, if (col2
is True) and and (col1
is True for any of the entries with the same id) then assign True. Otherwise False.
I have made a simple example:
df = pd.DataFrame([[0,1,1,2,2,3,3],[False, False, False, False, False, False, True],[False, True, False, False, True ,True, False]]).transpose()
df.columns = ['id', 'col1', 'col2']
gives the following DataFrame
:
id col1 col2
0 0 False False
1 1 False True
2 1 False False
3 2 False False
4 2 False True
5 3 False True
6 3 True False
According to the above rule, the following should column should be added:
0 False
1 False
2 False
3 False
4 False
5 True
6 False
Any ideas on an elegant way to do this?
A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results. After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data.
The abstract definition of grouping is to provide a mapping of labels to group names. Pandas datasets can be split into any of their objects. There are multiple ways to split data like: Note : In this we refer to the grouping objects as the keys. In order to group data with one key, we pass only one key as an argument in groupby function.
If you call dir () on a Pandas GroupBy object, then you’ll see enough methods there to make your head spin! It can be hard to keep track of all of the functionality of a Pandas GroupBy object. One way to clear the fog is to compartmentalize the different methods into what they do and how they behave.
Pandas datasets can be split into any of their objects. There are multiple ways to split data like: Note : In this we refer to the grouping objects as the keys. In order to group data with one key, we pass only one key as an argument in groupby function. Now we group a data of Name using groupby () function.
df.groupby('id').col1.transform('any') & df.col2
0 False
1 False
2 False
3 False
4 False
5 True
6 False
dtype: bool
This code will produce the output you requested:
df2 = df.merge(df.groupby('id')['col1'] # group on "id" and select 'col1'
.any() # True if any items are True
.rename('cond2') # name Series 'cond2'
.to_frame() # make a dataframe for merging
.reset_index()) # reset_index to get id column back
print(df2.col2 & df2.cond2) # True when 'col2' and 'cond2' are True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With