This is a self answered question. Given two dataFrames,
x
0 1
0 1 2
1 3 4
y
0 1 2
0 5 6 7
1 8 9 X
2 Y Z 0
The diagonal concatenation of x and y is given by:
0 1 3 4 5
0 1.0 2.0 NaN NaN NaN
1 3.0 4.0 NaN NaN NaN
2 NaN NaN 5 6 7
3 NaN NaN 8 9 X
4 NaN NaN Y Z 0
What is the easiest and simplest way of doing this? I would like to consider two cases:
First, the simple case. Assuming both the headers and indexes are monotonically numeric, you can just modify y's indexers as offsets from x:
y.index += x.index[-1] + 1
y.columns += x.columns[-1] + 1
pd.concat([x, y])
0 1 2 3 4
0 1.0 2.0 NaN NaN NaN
1 3.0 4.0 NaN NaN NaN
2 NaN NaN 5 6 7
3 NaN NaN 8 9 X
4 NaN NaN Y Z 0
Now, to generalise this to multiple DataFrames, we iterate over a loop:
df_list = [x, y]
offset_x = offset_y = 0
for df in df_list:
df.index = np.arange(len(df)) + offset_x
df.columns = np.arange(len(df.columns)) + offset_y
offset_x += df.index[-1] + 1
offset_y += df.columns[-1] + 1
pd.concat(df_list)
0 1 2 3 4
0 1.0 2.0 NaN NaN NaN
1 3.0 4.0 NaN NaN NaN
2 NaN NaN 5 6 7
3 NaN NaN 8 9 X
4 NaN NaN Y Z 0
If either your index/columns are not monotonically increasing, I strongly suggest resetting them before concatenating, or look into the option below.
If you're okay with 0s instead of NaNs, you can use scipy's block_diag without having to modify either the indices or columns:
from scipy.linalg import block_diag
pd.DataFrame(block_diag(*df_list))
0 1 2 3 4
0 1 2 0 0 0
1 3 4 0 0 0
2 0 0 5 6 7
3 0 0 8 9 X
4 0 0 Y Z 0
Credit to this answer for this solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With