Hi I've been digging through the concat, join, and merge methods for pandas and can't seem to find what I want.
Lets assume I have two dataframes
A = pd.DataFrame("A",index=[0,1,2,3,4],columns=['Col 1','Col 2','Col 3'])
B = pd.DataFrame("B",index=[0,1,2,3,4],columns=['Col 1','Col 2','Col 3'])
>>> A
Col 1 Col 2 Col 3
0 A A A
1 A A A
2 A A A
3 A A A
4 A A A
>>> B
Col 1 Col 2 Col 3
0 B B B
1 B B B
2 B B B
3 B B B
4 B B B
Now I want to make a new dataframe with the columns merged, I think its easiest to explain if I make a multi index for how I want the columns
index = pd.MultiIndex.from_product([A.columns.values,['A','B']])
>>> index
MultiIndex(levels=[['Col 1', 'Col 2', 'Col 3'], ['A', 'B']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
Now if I make an empty dataframe with this multi index for the columns
empty_df = pd.DataFrame('-',index=A.index,columns=index)
>>> empty_df
Col 1 Col 2 Col 3
A B A B A B
0 - - - - - -
1 - - - - - -
2 - - - - - -
3 - - - - - -
4 - - - - - -
My question is, what merge, concat, or join do I use to obtain that? I've tried multiple things for concat...inner,outer etc. I can't seem to find what I want. Only thing I can think of is making the empty dataframe and then back filling.
Edit: After trying out Jezrael's response, it is close but not it exactly. What I want is like nested columns of sort? For example
empty_df['Col 1']
>>> empty_df['Col 1']
A B
0 - -
1 - -
2 - -
3 - -
4 - -
Or
>>> empty_df['Col 1']['A']
0 -
1 -
2 -
3 -
4 -
Name: A, dtype: object
So this is a solution I've come up with but its from iterating over the columns.
row_idx = A.index.union(B.index)
col_idx = pd.MultiIndex.from_product([A.columns.values,['A','B']])
new_df = pd.DataFrame('-',index=row_idx,columns=col_idx)
for column in A.columns:
new_df.loc[:,(column,'A')] = A[column]
new_df.loc[:,(column,'B')] = B[column]
>>> new_df
Col 1 Col 2 Col 3
A B A B A B
0 A B A B A B
1 A B A B A B
2 A B A B A B
3 A B A B A B
4 A B A B A B
>>> new_df['Col 1']
A B
0 A B
1 A B
2 A B
3 A B
4 A B
>>> new_df['Col 1']['A']
0 A
1 A
2 A
3 A
4 A
Name: A, dtype: object
I think you need concat
with keys
parameter and axis=1
, last change order of levels by DataFrame.swaplevel
and sorting by first level by DataFrame.sort_index
:
df1 = (pd.concat([A, B], axis=1, keys=('A','B'))
.swaplevel(0,1, axis=1)
.sort_index(axis=1, level=0))
print (df1)
Col 1 Col 2 Col 3
A B A B A B
0 A B A B A B
1 A B A B A B
2 A B A B A B
3 A B A B A B
4 A B A B A B
For working with MultiIndex
is possible use DataFrame.xs
:
print (df1.xs('Col 1', axis=1, level=0))
A B
0 A B
1 A B
2 A B
3 A B
4 A B
If want select MultiIndex column
use tuple
:
print (df1[('Col 1', 'A')])
0 A
1 A
2 A
3 A
4 A
Name: (Col 1, A), dtype: object
If want select by index and by column use loc
:
print (df1.loc[4, ('Col 1', 'A')])
A
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With