I have a data_df that looks like:
price vehicleType yearOfRegistration gearbox powerPS model kilometer fuelType brand notRepairedDamage postalCode
0 18300 coupe 2011 manuell 190 NaN 125000 diesel audi ja 66954
1 9800 suv 2004 automatik 163 grand 125000 diesel jeep NaN 90480
2 1500 kleinwagen 2001 manuell 75 golf 150000 benzin volkswagen nein 91074
3 3600 kleinwagen 2008 manuell 69 fabia 90000 diesel skoda nein 60437
4 650 limousine 1995 manuell 102 3er 150000 benzin bmw ja 33775
Tried to convert classification columns (vehicleType
) to dummies ("one hot encoding"):
columns = [ 'vehicleType' ] #, 'gearbox', 'model', 'fuelType', 'brand', 'notRepairedDamage' ]
for column in columns:
dummies = pd.get_dummies(data_df[column], prefix=column)
data_df.drop(columns=[column], inplace=True)
data_df = data_df.add(dummies, axis='columns')
But the original data is missing:
brand fuelType gearbox kilometer model notRepairedDamage ... vehicleType_coupe vehicleType_kleinwagen vehicleType_kombi vehicleType_limousine vehicleType_suv yearOfRegistration
0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
So, how to replace a given column with the dummies?
For example, if you have the categorical variable “Gender” in your dataframe called “df” you can use the following code to make dummy variables: df_dc = pd. get_dummies(df, columns=['Gender']) . If you have multiple categorical variables you simply add every variable name as a string to the list!
In this case, you need to turn your column of labels (Ex: ['cat', 'dog', 'bird', 'cat']) into separate columns of 0s and 1s. This is called getting dummies pandas columns. Pandas pd. get_dummies() will turn your categorical column (column of labels) into indicator columns (columns of 0s and 1s).
append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.
# Get one hot encoding of columns 'vehicleType'
one_hot = pd.get_dummies(data_df['vehicleType'])
# Drop column as it is now encoded
data_df = data_df.drop('vehicleType',axis = 1)
# Join the encoded df
data_df = data_df.join(one_hot)
data_df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With