Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Panda's dataframes - generate column with group level information

I generated a Panda's DataFrame with:

data={'id': [1.0, 1, 2, 3, 3, 3, 4.0,4.0,5,5],'some':['Yes','No','No','Yes','Yes','Yes','No','No','No','Yes']}
df=DataFrame(data)

In this DataFrame I would like to add a column "someIDlevel" which contains the "some" information "at the ID" level. The following rules apply: whenever within an ID there is at least one "Yes" in "some" than "someIdlevel" should be all yes for that particular "id", otherwise it should be "No" for that particular ID.

So the final dataframe should look like as if created by this code:

data_fin={'id': [1.0, 1, 2, 3, 3, 3, 4.0,4.0,5,5],'some':'Yes','No','No','Yes','Yes','Yes','No','No','No','Yes'],'someIDlevel':['Yes','Yes','No','Yes','Yes','Yes','No','No','Yes','Yes']}       df_fin=pd.DataFrame(data_fin)
like image 607
clog14 Avatar asked Jun 01 '26 11:06

clog14


1 Answers

You could do the following.

First perform a left-merge on a groupby:

df = pd.merge(
    df,
    df.some.groupby(df.id).apply(lambda g: 'Yes' if 'Yes' in g.values else 'No').reset_index(),
    how='left')

Following that, simply rename the new column to your desired name:

>>> df.rename(columns={0: 'someIdlevel'})
    id  some    someIdlevel
0   1   Yes Yes
1   1   No  Yes
2   2   No  No
3   3   Yes Yes
4   3   Yes Yes
5   3   Yes Yes
6   4   No  No
7   4   No  No
8   5   No  Yes
9   5   Yes Yes
like image 129
Ami Tavory Avatar answered Jun 04 '26 02:06

Ami Tavory



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!