I have two pandas DataFrames with (not necessarily) identical index and column names. <pre class="prettyprint"><code>>>> df_L = pd.DataFrame({'X': [1, 3], 'Y': [5, 7]}) >>> df_R = pd.DataFrame({'X': [2, 4], 'Y': [6, 8]}) </code></pre> I can join them together and assign suffixes. <pre class="prettyprint"><code>>>> df_L.join(df_R, lsuffix='_L', rsuffix='_R') X_L Y_L X_R Y_R 0 1 5 2 6 1 3 7 4 8 </code></pre> But what I want is to make 'L' and 'R' sub-columns under both 'X' and 'Y'. The desired DataFrame looks like this: <pre class="prettyprint"><code>>>> pd.DataFrame(columns=pd.MultiIndex.from_product([['X', 'Y'], ['L', 'R']]), data=[[1, 5, 2, 6], [3, 7, 4, 8]]) X Y L R L R 0 1 5 2 6 1 3 7 4 8 </code></pre> Is there a way I can combine the two original DataFrames to get this desired DataFrame?

You can use <code>pd.concat</code> with the <code>keys</code> argument, along the first axis: <pre class="prettyprint"><code>df = pd.concat([df_L, df_R], keys=['L','R'],axis=1).swaplevel(0,1,axis=1).sort_index(level=0, axis=1) >>> df X Y L R L R 0 1 2 5 6 1 3 4 7 8 </code></pre>

Convert column suffixes from pandas join into a MultiIndex

Tags:

python

pandas

I have two pandas DataFrames with (not necessarily) identical index and column names.

>>> df_L = pd.DataFrame({'X': [1, 3], 
                         'Y': [5, 7]})

>>> df_R = pd.DataFrame({'X': [2, 4], 
                         'Y': [6, 8]})

I can join them together and assign suffixes.

>>> df_L.join(df_R, lsuffix='_L', rsuffix='_R')

    X_L Y_L X_R Y_R
0   1   5   2   6
1   3   7   4   8

But what I want is to make 'L' and 'R' sub-columns under both 'X' and 'Y'.

The desired DataFrame looks like this:

>>> pd.DataFrame(columns=pd.MultiIndex.from_product([['X', 'Y'], ['L', 'R']]), 
         data=[[1, 5, 2, 6],
               [3, 7, 4, 8]])

    X       Y
    L   R   L   R
0   1   5   2   6
1   3   7   4   8

Is there a way I can combine the two original DataFrames to get this desired DataFrame?

346

asked Nov 02 '18 20:11

Vermillion

1 Answers

You can use pd.concat with the keys argument, along the first axis:

df = pd.concat([df_L, df_R], keys=['L','R'],axis=1).swaplevel(0,1,axis=1).sort_index(level=0, axis=1)

>>> df
   X     Y   
   L  R  L  R
0  1  2  5  6
1  3  4  7  8

answered Sep 24 '22 15:09

sacuL

Related questions
                            
                                How does shuffling work with ImageDataGenerator in Machine Learning?
                            
                                How to model a shared layer in keras?
                            
                                sigmoid_cross_entropy loss function from tensorflow for image segmentation
                            
                                Python 3.5 string format: How to add a thousands-separator and also right justify?
                            
                                How to duplicate a specific value in a list/array?
                            
                                single element in a list
                            
                                Django initialize data test for all test classes
                            
                                Store filtered output of cmd command in a variable
                            
                                TypeError: 'dict_items' object is not subscriptable on running if statement to shortlist items
                            
                                OneHotEncoder - encoding only some of categorical variable columns
                            
                                Python seaborn catplot - How do I change the y-axis scale to percentage
                            
                                Force password authentication (ignore keys in .ssh folder) in Paramiko in Python
                            
                                Clustering images using unsupervised Machine Learning
                            
                                Python pyodbc.row to list
                            
                                Using deprecated Numpy API
                            
                                Loading numpy array from http response without saving a file
                            
                                Problem with saving spark DataFrame as Hive table
                            
                                spark possible to split dataframe into parts for topandas
                            
                                Add SVM to last layer
                            
                                how do I cluster a list of geographic points by distance?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With