I have the following toy dataframe (the real one has 500k rows):
df = pd.DataFrame({'size': list('SSMMMLS'),
'weight': [8, 10, 11, 1, 20, 14, 12],
'adult' : [False] * 5 + [True] * 2})
adult size weight
0 False S 8
1 False S 10
2 False M 11
3 False M 1
4 False M 20
5 True L 14
6 True S 12
And want to groupby adult
, select the row for which weight
is maximal and assign in a new column size2
the size
column value.
In other words we want a column size2 with the size value of the line with the max weight
propagated to the adult
groupby. So all adult
= False lines will have value S because adult=False max weight is 20.
adult size size2 weight
0 False S S 8
1 False S S 10
2 False M S 11
3 False M S 1
4 False M S 20
5 True L L 14
6 True S L 12
I found this but it doesn't work for me
So far I have :
df.loc[:, 'size2'] = (df.groupby('adult',as_index=True)['weight','size']
.transform(lambda x: x.ix[x['weight'].idxmax()]['size']))
One of the simplest methods on groupby objects is the sum() method. To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.
groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
Just a more detailed veresion of the @jazrael answer, with your dataframe:
df = pd.DataFrame({'size': list('SSMMMLS'),
'weight': [8, 10, 11, 1, 20, 14, 12],
'adult' : [False] * 5 + [True] * 2})
# adult size weight
# 0 False S 8
# 1 False S 10
# 2 False M 11
# 3 False M 1
# 4 False M 20
# 5 True L 14
# 6 True S 12
To get size value for the max weight line:
def size4max_weight(subf):
""" Return size value for the max weight line """
return subf['size'][subf['weight'].idxmax()]
A groupby on 'adult' will produce a Serie with False, True as indexes values::
>>> size2_col = df.groupby('adult').apply(size4max_weight)
>>> type(size2_col), size2_col.index
(pandas.core.series.Series, Index([False, True], dtype='object', name=u'adult'))
With reset_index
we convert the serie in DataFrame::
>>> size2_col = df.groupby('adult').apply(size4max_weight).reset_index(name='size2')
>>> size2_col
adult size2
0 False M
1 True L
>>>
pd.merge
on 'adult' make it:
>>> pd.merge(df, size2_col, on=['adult'])
adult size weight size2
0 False S 8 M
1 False S 10 M
2 False M 11 M
3 False M 1 M
4 False M 20 M
5 True L 14 L
6 True S 12 L
You could use transform
with loc
and values
:
>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
adult size weight size2
0 False S 8 M
1 False S 10 M
2 False M 11 M
3 False M 1 M
4 False M 20 M
5 True L 14 L
6 True S 12 L
Step by step, first we find the appropriate indices:
>>> df.groupby("adult")["weight"].transform("idxmax")
0 4
1 4
2 4
3 4
4 4
5 5
6 5
dtype: int64
Then we use these to index into the size
column with loc
:
>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")]
4 M
4 M
4 M
4 M
4 M
5 L
5 L
Name: size, dtype: object
And finally we take .values
so that the indices don't get in the way when we try to assign:
>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
array(['M', 'M', 'M', 'M', 'M', 'L', 'L'], dtype=object)
>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
adult size weight size2
0 False S 8 M
1 False S 10 M
2 False M 11 M
3 False M 1 M
4 False M 20 M
5 True L 14 L
6 True S 12 L
>>>
IIUC you can use merge
. I think first value in size2
is M
, because max weight
is 20
.
df = pd.DataFrame({'size': list('SSMMMLS'),
'weight': [8, 10, 11, 1, 20, 14, 12],
'adult' : [False] * 5 + [True] * 2})
print(df)
adult size weight
0 False S 8
1 False S 10
2 False M 11
3 False M 1
4 False M 20
5 True L 14
6 True S 12
print(
df.groupby('adult')
.apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2')
)
adult size2
0 False M
1 True L
print(
pd.merge(df,
df.groupby('adult')
.apply(lambda subf: subf['size'][subf['weight'].idxmax()]
).reset_index(name='size2'), on=['adult'])
)
adult size weight size2
0 False S 8 M
1 False S 10 M
2 False M 11 M
3 False M 1 M
4 False M 20 M
5 True L 14 L
6 True S 12 L
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With