I am trying to column-bind dataframes and having issue with pandas concat
, as ignore_index=True
doesn't seem to work:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 2, 3,4]) df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'], 'C': ['C4', 'C5', 'C6', 'C7'], 'D2': ['D4', 'D5', 'D6', 'D7']}, index=[ 5, 6, 7,3]) df1 # A B D # 0 A0 B0 D0 # 2 A1 B1 D1 # 3 A2 B2 D2 # 4 A3 B3 D3 df2 # A1 C D2 # 5 A4 C4 D4 # 6 A5 C5 D5 # 7 A6 C6 D6 # 3 A7 C7 D7 dfs = [df1,df2] df = pd.concat( dfs,axis=1,ignore_index=True) print df
and the result is
0 1 2 3 4 5 0 A0 B0 D0 NaN NaN NaN 2 A1 B1 D1 NaN NaN NaN 3 A2 B2 D2 A7 C7 D7 4 A3 B3 D3 NaN NaN NaN 5 NaN NaN NaN A4 C4 D4 6 NaN NaN NaN A5 C5 D5 7 NaN NaN NaN A6 C6 D6
Even if I reset index using
df1.reset_index() df2.reset_index()
and then try
pd.concat([df1,df2],axis=1)
it still produces the same result!
If you want the concatenation to ignore existing indices, you can set the argument ignore_index=True . Then, the resulting DataFrame index will be labeled with 0 , …, n-1 . To concatenate DataFrames horizontally along the axis 1 , you can set the argument axis=1 .
concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.
ignore_index : If True, do not use the index labels. verify_integrity : If True, raise ValueError on creating index with duplicates. sort : Sort columns if the columns of self and other are not aligned. The default sorting is deprecated and will change to not-sorting in a future version of pandas.
Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.
If I understood you correctly, this is what you would like to do.
import pandas as pd df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 2, 3,4]) df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'], 'C': ['C4', 'C5', 'C6', 'C7'], 'D2': ['D4', 'D5', 'D6', 'D7']}, index=[ 4, 5, 6 ,7]) df1.reset_index(drop=True, inplace=True) df2.reset_index(drop=True, inplace=True) df = pd.concat( [df1, df2], axis=1)
Which gives:
A B D A1 C D2 0 A0 B0 D0 A4 C4 D4 1 A1 B1 D1 A5 C5 D5 2 A2 B2 D2 A6 C6 D6 3 A3 B3 D3 A7 C7 D7
Actually, I would have expected that df = pd.concat(dfs,axis=1,ignore_index=True)
gives the same result.
This is the excellent explanation from jreback:
ignore_index=True
‘ignores’, meaning doesn’t align on the joining axis. it simply pastes them together in the order that they are passed, then reassigns a range for the actual index (e.g.range(len(index))
) so the difference between joining on non-overlapping indexes (assumeaxis=1
in the example), is that withignore_index=False
(the default), you get the concat of the indexes, and withignore_index=True
you get a range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With