Groupby names replace values with there max value in all columns pandas

Tags:

I have this DataFrame

lst = [['AAA',15,'BBB',20],['BBB',16,'AAA',12],['BBB',22,'CCC',15],['CCC',11,'AAA',31],['DDD',25,'EEE',35]]
df = pd.DataFrame(lst,columns = ['name1','val1','name2','val2'])

which looks like this

 name1   val1 name2 val2
0  AAA     15  BBB   20
1  BBB     16  AAA   12
2  BBB     22  CCC   15
3  CCC     11  AAA   31
4  DDD     25  EEE   35

I want this

 name1   val1 name2  val2
0  AAA     31  BBB    22
1  BBB     22  AAA    31
2  BBB     22  CCC    15
3  CCC     15  AAA    31
4  DDD     25  EEE    35

replaced all values with the maximum value. we choose the maximum value from both val1 and val2

if i do this i will get the maximum from only val1

df["val1"] = df.groupby("name1")["val1"].transform("max")

297

asked Aug 13 '20 17:08

Ajay Chinni

3 Answers

Try using pd.wide_to_long to melt that dataframe into a long form, then use groupby with transform to find the max value. Map that max value to 'name' and reshape back to four column (wide) dataframe:

df_long = pd.wide_to_long(df.reset_index(), ['name','val'], 'index', j='num',sep='',suffix='\d+')
mapper= df_long.groupby('name')['val'].max()
df_long['val'] = df_long['name'].map(mapper)
df_new = df_long.unstack()
df_new.columns = [f'{i}{j}' for i,j in df_new.columns]
df_new

Output:

      name1 name2  val1  val2
index                        
0       AAA   BBB    31    22
1       BBB   AAA    22    31
2       BBB   CCC    22    15
3       CCC   AAA    15    31
4       DDD   EEE    25    35

120

answered Nov 03 '22 20:11

Scott Boston

Borrow Scott's setting up

df_long = pd.wide_to_long(df.reset_index(), ['name','val'], 'index', j='num',sep='',suffix='\d+')
d = df_long.groupby('name')['val'].max()

df.loc[:,df.columns.str.startswith('val')]=df.loc[:,df.columns.str.startswith('name')].replace(d).values
df
Out[196]: 
  name1  val1 name2  val2
0   AAA    31   BBB    22
1   BBB    22   AAA    31
2   BBB    22   CCC    15
3   CCC    15   AAA    31
4   DDD    25   EEE    35

answered Nov 03 '22 20:11

BENY

You can use lreshape (undocumented and ambiguous as to whether it's tested or will continue to remain) to get the long DataFrame, then map each pair of columns using the max.

names = df.columns[df.columns.str.startswith('name')]
vals = df.columns[df.columns.str.startswith('val')]

s = (pd.lreshape(df, groups={'name': names, 'val': vals})
       .groupby('name')['val'].max())

for n in names:
    df[n.replace('name', 'val')] = df[n].map(s)

  name1  val1 name2  val2
0   AAA    31   BBB    22
1   BBB    22   AAA    31
2   BBB    22   CCC    15
3   CCC    15   AAA    31
4   DDD    25   EEE    35

answered Nov 03 '22 19:11

ALollz

Related questions
                            
                                PyPDF2 split pdf by pages
                            
                                Extract first 3 words from string
                            
                                Append a list in Google Sheet from Python
                            
                                How to update a foreign key field in Django models.py?
                            
                                The meaning of Bit-wise NOT in Python [duplicate]
                            
                                ImportError: No module named 'sklearn.lda'
                            
                                NameError: name 'stopwords' is not defined
                            
                                How to calculate response time in Django
                            
                                How to add only to diagonals of array in Python?
                            
                                Cleanly combine year and month columns to single date column with pandas
                            
                                Remove elements from list which are just before some specific element
                            
                                classification_report: labels and target_names
                            
                                AttributeError: 'datetime.timedelta' object has no attribute 'year'
                            
                                Filtering the dataframe based on the column value of another dataframe
                            
                                f1_score metric in lightgbm
                            
                                Find the row associated with maximum date after groupby in Pandas
                            
                                Pandas groupby mean() not ignoring NaNs
                            
                                split a six digits number column into separated columns with one digit
                            
                                is there a way to convert h2oframe to pandas dataframe
                            
                                Pytorch: AttributeError: 'function' object has no attribute 'copy'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Groupby names replace values with there max value in all columns pandas

Tags:

python

python-3.x

pandas

pandas-groupby