Pandas concat appears to ignore indices

Question

I'm relatively new to Pandas. I ran into an unexpected issue with pd.concat() I didn't expect.

df1 = pd.DataFrame([], columns=['a', 'b', 'c']).set_index(['b', 'a'])
df2 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c']).set_index(['a', 'b']) # intentionally reverse
pd.concat([df1, df2])

I would expect the result of the above to be:

     c
a b
1 2  3

but instead it is:

     c
b a <---- note that b=1 and a=2 here
1 2  3

In other words, it appears that pd.concat() is ignoring the index headers when doing the pd.concat(), but then relabeling the headers after the pd.concat() is completed.

On the other hand, pd.concat() works as I would expect with column headers. The result of pd.concat([df1.reset_index(), df2.reset_index()]) is:

     a    b  c
0  1.0  2.0  3

as expected.

Is the behavior that I observed with pd.concat() and indices expected behavior?

I tried Googling around, but I haven't been able to find an example of someone running into an issue similar to this.

Thanks!

Valdi_Bo · Accepted Answer

It seems that Pandas during concat:

Takes index column names from the first DataFrame only.
But for further DataFrames, only the column numbers matter, as long as index columns are matched.

So in case of df1 MultiIndex is composed of column 1 and 0 (numeration starts from 0, but in df2 and df3 - composed of columns 0 and 1, regardless of their names.

To confirm it, try a bit wider example:

df1 = pd.DataFrame([], columns=['a', 'b', 'c']).set_index(['b', 'a'])
df2 = pd.DataFrame([[1, 2, 3]], columns=['aa', 'bb', 'c']).set_index(['aa', 'bb'])
df3 = pd.DataFrame([[10, 20, 30]], columns=['xx', 'yy', 'c']).set_index(['xx', 'yy'])
pd.concat([df1, df2, df3])

The result is:

So as you can see, even if source column names (for index columns only) are different, this means nothing. Only their position among columns is important.

But if you change the third column name (of a regular column):

df3 = pd.DataFrame([[10, 20, 30]], columns=['xx', 'yy', 'cc']).set_index(['xx', 'yy'])

(c changed to *cc), the result is different:

         c    cc
b  a            
1  2   3.0   NaN
10 20  NaN  30.0

Pandas concat appears to ignore indices

Tags:

python

pandas

bacchuswng

1 Answers

Valdi_Bo

Recent Activity

Donate For Us

Pandas concat appears to ignore indices

Tags:

python

pandas

bacchuswng

1 Answers

Valdi_Bo

Related questions

Recent Activity

Donate For Us