Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas concat failing

Tags:

python

pandas

I am trying to concat dataframes based on the foll. 2 csv files:

df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0

df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0

Both of these have the same number and names of columns. However, when I do this:

pandas.concat([df_a, df_b]) 

I get the error:

AssertionError: Number of manager items must equal union of block items # manager items: 20, # tot_items: 21 

How to fix this?

like image 423
user308827 Avatar asked Feb 01 '16 18:02

user308827


People also ask

Is pandas concat fast?

In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.

How does concat work in pandas?

concat() function in Python. pandas. concat() function does all the heavy lifting of performing concatenation operations along with an axis od Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

How do I concatenate without an index?

If you want the concatenation to ignore existing indices, you can set the argument ignore_index=True . Then, the resulting DataFrame index will be labeled with 0 , …, n-1 . To concatenate DataFrames horizontally along the axis 1 , you can set the argument axis=1 .

What is difference between pandas concat and merge?

Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.


1 Answers

I believe that this error occurs if the following two conditions are met:

  1. The data frames have different columns. (i.e. (df1.columns == df2.columns) is False
  2. The columns has a repeated value.

Basically if you concat dataframes with columns [A,B,C] and [B,C,D] it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C] it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.

If your dataframes are such that df1.columns == df2.columns then it will work anyway. So you can join [B,B,C] to [B,B,C], but not to [C,B,B], as if the columns are identical it probably just uses the integer indexes or something.

like image 96
phil_20686 Avatar answered Oct 11 '22 02:10

phil_20686