I have this DataFrame
lst = [['AAA',15,'BBB',20],['BBB',16,'AAA',12],['BBB',22,'CCC',15],['CCC',11,'AAA',31],['DDD',25,'EEE',35]]
df = pd.DataFrame(lst,columns = ['name1','val1','name2','val2'])
which looks like this
name1 val1 name2 val2
0 AAA 15 BBB 20
1 BBB 16 AAA 12
2 BBB 22 CCC 15
3 CCC 11 AAA 31
4 DDD 25 EEE 35
I want this
name1 val1 name2 val2
0 AAA 31 BBB 22
1 BBB 22 AAA 31
2 BBB 22 CCC 15
3 CCC 15 AAA 31
4 DDD 25 EEE 35
replaced all values with the maximum value. we choose the maximum value from both val1
and val2
if i do this i will get the maximum from only val1
df["val1"] = df.groupby("name1")["val1"].transform("max")
To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.
How to groupby multiple columns in pandas DataFrame and compute multiple aggregations? groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
Try using pd.wide_to_long
to melt that dataframe into a long form, then use groupby with transform to find the max value. Map that max value to 'name' and reshape back to four column (wide) dataframe:
df_long = pd.wide_to_long(df.reset_index(), ['name','val'], 'index', j='num',sep='',suffix='\d+')
mapper= df_long.groupby('name')['val'].max()
df_long['val'] = df_long['name'].map(mapper)
df_new = df_long.unstack()
df_new.columns = [f'{i}{j}' for i,j in df_new.columns]
df_new
Output:
name1 name2 val1 val2
index
0 AAA BBB 31 22
1 BBB AAA 22 31
2 BBB CCC 22 15
3 CCC AAA 15 31
4 DDD EEE 25 35
Borrow Scott's setting up
df_long = pd.wide_to_long(df.reset_index(), ['name','val'], 'index', j='num',sep='',suffix='\d+')
d = df_long.groupby('name')['val'].max()
df.loc[:,df.columns.str.startswith('val')]=df.loc[:,df.columns.str.startswith('name')].replace(d).values
df
Out[196]:
name1 val1 name2 val2
0 AAA 31 BBB 22
1 BBB 22 AAA 31
2 BBB 22 CCC 15
3 CCC 15 AAA 31
4 DDD 25 EEE 35
You can use lreshape
(undocumented and ambiguous as to whether it's tested or will continue to remain) to get the long DataFrame, then map each pair of columns using the max.
names = df.columns[df.columns.str.startswith('name')]
vals = df.columns[df.columns.str.startswith('val')]
s = (pd.lreshape(df, groups={'name': names, 'val': vals})
.groupby('name')['val'].max())
for n in names:
df[n.replace('name', 'val')] = df[n].map(s)
name1 val1 name2 val2
0 AAA 31 BBB 22
1 BBB 22 AAA 31
2 BBB 22 CCC 15
3 CCC 15 AAA 31
4 DDD 25 EEE 35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With