Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas concat yields ValueError: Plan shapes are not aligned

In pandas, I am attempting to concatenate a set of dataframes and I am getting this error:

ValueError: Plan shapes are not aligned

My understanding of .concat() is that it will join where columns are the same, but for those that it can't find it will fill with NA. This doesn't seem to be the case here.

Here's the concat statement:

dfs = [npo_jun_df, npo_jul_df,npo_may_df,npo_apr_df,npo_feb_df]
alpha = pd.concat(dfs)
like image 923
Lt.Fr0st Avatar asked Oct 06 '14 23:10

Lt.Fr0st


2 Answers

In case it helps, I have also hit this error when I tried to concatenate two data frames (and as of the time of writing this is the only related hit I can find on google other than the source code).

I don't know whether this answer would have solved the OP's problem (since he/she didn't post enough information), but for me, this was caused when I tried to concat dataframe df1 with columns ['A', 'B', 'B', 'C'] (see the duplicate column headings?) with dataframe df2 with columns ['A', 'B']. Understandably the duplication caused pandas to throw a wobbly. Change df1 to ['A', 'B', 'C'] (i.e. drop one of the duplicate columns) and everything works fine.

like image 131
user3805082 Avatar answered Oct 12 '22 01:10

user3805082


I recently got this message, too, and I found like user @jason and @user3805082 above that I had duplicate columns in several of the hundreds of dataframes I was trying to concat, each with dozens of enigmatic varnames. Manually searching for duplicates was not practical.

In case anyone else has the same problem, I wrote the following function which might help out.

def duplicated_varnames(df):
    """Return a dict of all variable names that 
    are duplicated in a given dataframe."""
    repeat_dict = {}
    var_list = list(df) # list of varnames as strings
    for varname in var_list:
        # make a list of all instances of that varname
        test_list = [v for v in var_list if v == varname] 
        # if more than one instance, report duplications in repeat_dict
        if len(test_list) > 1: 
            repeat_dict[varname] = len(test_list)
    return repeat_dict

Then you can iterate over that dict to report how many duplicates there are, delete the duplicated variables, or rename them in some systematic way.

like image 12
William Welsh Avatar answered Oct 12 '22 03:10

William Welsh