I am parsing data from excel files and the columns of the resulting DataFrame may or may not align to a base DataFramewhere I want to stack several parsed DataFrame.
Lets call the DataFrame I parse from data A, and the base DataFrame df_A.
I read an excel shee resulting in A=
Index AGUB AGUG MUEB MUEB SIL SIL SILB SILB
2012-01-01 00:00:00 0.00 0 0.00 50.78 0.00 0.00 0.00 0.00
2012-01-01 01:00:00 0.00 0 0.00 53.15 0.00 53.15 0.00 0.00
2012-01-01 02:00:00 0.00 0 0.00 0.00 53.15 53.15 53.15 53.15
2012-01-01 03:00:00 0.00 0 0.00 0.00 0.00 55.16 0.00 0.00
2012-01-01 04:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 05:00:00 48.96 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 06:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 07:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 08:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 09:00:00 52.28 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 10:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 11:00:00 36.93 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 12:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 13:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 50.00
2012-01-01 14:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 34.01
2012-01-01 15:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 16:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 17:00:00 53.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 18:00:00 0.00 75 0.00 75.00 0.00 75.00 0.00 0.00
2012-01-01 19:00:00 0.00 70 0.00 70.00 0.00 0.00 0.00 0.00
2012-01-01 20:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 21:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 22:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 23:00:00 0.00 0 53.45 53.45 0.00 0.00 0.00 0.00
I create the base dataframe:
units = ['MUE', 'MUEB', 'SIL', 'SILB', 'AGUG', 'AGUB', 'MUEBP', 'MUELP']
df_A = pd.DataFrame(columns=units)
df_A = pd.concat([df_A, A], axis=0)
Usually with concat if A had less columns than df_A it'll be fine, but in this case the only difference in the columns is the order. the concatenation leads to the following error:
ValueError: Plan shapes are not aligned
I'd like to know how to concatenate the two dataframes with the column order given by df_A.
I've tried this and it doesn't matter whether there are more columns in the source, or target defined DataFrame - either way, the result is a dataframe that consists of a union of all supplied columns (with empty columns specified in the target, but not populated by the source populated with NaN).
Where I have been able to reproduce your error is where the column names in either the source or target dataframe include a duplicate name (or empty column names).
In your example, various columns appear more than once in your source file. I don't think concat copes very well with these kinds of duplicate columns.
import pandas as pd
s1 = [0,1,2,3,4,5]
s2 = [0,0,0,0,1,1]
A = pd.DataFrame([s2,s1],columns=['A','B','C','D','E','F'])
Resulting in:
A B C D E F ----------- 0 0 0 0 1 1 0 1 2 3 4 5
Take a subset of columns and use them to create a new dataframe called B
B = A[['A','C','E']]
A C E ----- 0 0 1 0 2 4
Create a new empty target dataframe
col_names = ['D','A','C','B']
Z = pd.DataFrame(columns=col_names)
D A C B -------
And concatenate the two:
Z = pd.concat([B,Z],axis=0)
A C D E 0 0 NaN 1 0 2 NaN 4
Works fine!
But if I recreate the empty dataframe using columns as so:
col_names = ['D','A','C','D']
Z = pd.DataFrame(columns=col_names)
D A C D
And try to concatenate:
col_names = ['D','A','C','D']
Z = pd.DataFrame(columns=col_names)
Then I get the error you describe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With