I am trying to concat dataframes based on the foll. 2 csv files:
df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0
df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0
Both of these have the same number and names of columns. However, when I do this:
pandas.concat([df_a, df_b])
I get the error:
AssertionError: Number of manager items must equal union of block items # manager items: 20, # tot_items: 21
How to fix this?
In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.
concat() function in Python. pandas. concat() function does all the heavy lifting of performing concatenation operations along with an axis od Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.
If you want the concatenation to ignore existing indices, you can set the argument ignore_index=True . Then, the resulting DataFrame index will be labeled with 0 , …, n-1 . To concatenate DataFrames horizontally along the axis 1 , you can set the argument axis=1 .
Concat function concatenates dataframes along rows or columns. We can think of it as stacking up multiple dataframes. Merge combines dataframes based on values in shared columns. Merge function offers more flexibility compared to concat function because it allows combinations based on a condition.
I believe that this error occurs if the following two conditions are met:
(df1.columns == df2.columns)
is False
Basically if you concat
dataframes with columns [A,B,C]
and [B,C,D]
it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C]
it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.
If your dataframes are such that df1.columns == df2.columns
then it will work anyway. So you can join [B,B,C]
to [B,B,C]
, but not to [C,B,B]
, as if the columns are identical it probably just uses the integer indexes or something.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With