How do I append to a Pandas DataFrame containing predefined columns of categorical datatype:
df=pd.DataFrame([],columns=['a','b'])
df['a']=pd.Categorical([],categories=[0,1])
new_df=pd.DataFrame.from_dict({'a':[1],'b':[0]})
df.append(new_df)
The above throws me an error:
ValueError: all the input arrays must have same number of dimensions
Update: if the categories are strings as opposed to ints, appending seems to work:
df['a']=pd.Categorical([],categories=['Left','Right'])
new_df=pd.DataFrame.from_dict({'a':['Left'],'b':[0]})
df.append(new_df)
So, how do I append to DataFrames with categories of int values? Secondly, I presumed that with binary values (0/1), storing the column as Categorical instead of numeric datatype would be more efficient or faster. Is this true? If not, I may not even bother to convert my columns to Categorical type.
You have to keep the both data frames consistent. As you are converting the column a from first data frame as categorical, you need do the same for second data frame. You can do it as following-
import pandas as pd
df=pd.DataFrame([],columns=['a', 'b'])
df['a']=pd.Categorical([],[0, 1])
new_df=pd.DataFrame.from_dict({'a':[0,1,1,1,0,0],'b':[1,1,8,4,0,0]})
new_df['a'] = pd.Categorical(new_df['a'],[0, 1])
df.append(new_df, ignore_index=True)
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With