I have 2 dataframes with same column headers. I wish to perform hot encoding on both of them. I cannot perform them one by one. I wish to append two dataframe together and then perform hot encoding and then split them into 2 dataframes with headers on each of them again.
Code below perform hot encoding one by one instead of merging them and then hot encode.
train = pd.get_dummies(train, columns= ['is_discount', 'gender', 'city'])
test = pd.get_dummies(test, columns= ['is_discount', 'gender', 'city'])
Pandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Use concat with keys then divide i.e
#Example Dataframes
train = pd.DataFrame({'x':[1,2,3,4]})
test = pd.DataFrame({'x':[4,2,5,0]})
# Concat with keys
temp = pd.get_dummies(pd.concat([train,test],keys=[0,1]), columns=['x'])
# Selecting data from multi index
train,test = temp.xs(0),temp.xs(1)
Output :
#Train x_0 x_1 x_2 x_3 x_4 x_5 0 0 1 0 0 0 0 1 0 0 1 0 0 0 2 0 0 0 1 0 0 3 0 0 0 0 1 0 #Test x_0 x_1 x_2 x_3 x_4 x_5 0 0 0 0 0 1 0 1 0 0 1 0 0 0 2 0 0 0 0 0 1 3 1 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With