Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple DataFrame columns into one

Tags:

python

pandas

I'm trying to transform a DataFrame with a dynamic number of a_P columns looking like this

             a1_P       a2_P     weight  
0        33297.81   17407.93   14733.23  
1        58895.18   43013.57   86954.04  

into a new DataFrame, looking like this (sorted by P)

                P     weight  
0        17407.93   14733.23
1        33297.81   14733.23  
2        43013.57   86954.04
3        58895.18   86954.04    

So what I'm trying so far is

names = ["a1", "a2"]
p = pd.DataFrame(columns=["P", "weight"])
for i in range(0, len(names)):
  p += df[["{}_P".format(names[i]), "weight"]]

and to sort it afterwards but this does not work because columnnames are not identical I guess.

like image 774
Peter Klauke Avatar asked Feb 26 '26 12:02

Peter Klauke


2 Answers

The pandas.melt function does something like what you want:

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P')
     weight variable         P
0  14733.23     a1_P  33297.81
1  86954.04     a1_P  58895.18
2  14733.23     a2_P  17407.93
3  86954.04     a2_P  43013.57

And of course, soring by P is easily done by appending a .sort('P') to the end of the melt statement.

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P').sort('P')
     weight variable         P
2  14733.23     a2_P  17407.93
0  14733.23     a1_P  33297.81
3  86954.04     a2_P  43013.57
1  86954.04     a1_P  58895.18

And if you want to be super dynamic, maybe generating the value_vars in this fancy way:

n_values = 2
value_vars = ["a{}_P".format(i+1) for i in range(0, n_values)]
pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P').sort('P')

To get the index to be [0, 1, 2, 3 ...], just use .reset_index(drop=True) either as a chained event, or like this:

df = pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P')
df.sort(inplace=True)
df.reset_index(drop=True, inplace=True)

I personally prefer inplace operations, because they are much much more memory efficient.

like image 103
firelynx Avatar answered Feb 28 '26 02:02

firelynx


A possible solution using Pandas concat (http://pandas.pydata.org/pandas-docs/stable/merging.html):

import pandas as pd                                                                           

df = pd.DataFrame.from_dict({'a1_P': [123.123, 342.123],
                             'a2_P': [232.12, 32.23],
                             'weight': [12312.23, 16232.3]})                        

cols = [x for x in df.columns if '_P' in x]                                         

new = pd.concat([df[col] for col in cols])                                          
oldidx = new.index                                                                  
weights = df.loc[new.index, 'weight'].tolist()                                      

new_df = pd.DataFrame.from_dict({'P': new,                                          
                                 'weight': weights})                                
new_df.sort(columns='P', inplace=True)                                           
new_df.reset_index(drop=True, inplace=True)   

print(new_df)

         P    weight                                                                          
0   32.230  16232.30
1  123.123  12312.23
2  232.120  12312.23
3  342.123  16232.30   

There is room for performance optimizations, but it should faster than a solution with explicit loops.

like image 26
chris-sc Avatar answered Feb 28 '26 02:02

chris-sc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!