I am transforming categorical data to numeric values for machine learning purposes.
To give an example, the buying price (= "buying" variable) of a car is categorized in: "vhigh, high, med, low". To transform it into numeric values, I used:
le = preprocessing.LabelEncoder()
buying = le.fit_transform(list(data["buying"]))
Is there a way to check how exactly Python transformed each of those labels into numeric value since this is done randomly (e.g. vhigh = 0, high = 2)?
You can create an extra column in your dataframe to map the values:
mapping_df = data[['buying']].copy() #Create an extra dataframe which will be used to address only the encoded values
mapping_df['buying_encoded'] = le.fit_transform(data['buying'].values) #Using values is faster than using list
Here's a full working example:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data = pd.DataFrame({'index':[0,1,2,3,4,5,6],
'buying':['Luffy','Nami','Luffy','Franky','Sanji','Zoro','Luffy']})
data['buying_encoded'] = le.fit_transform(data['buying'].values)
data = data.drop_duplicates('buying').set_index('index')
print(data)
Output:
buying buying_encoded
index
0 Luffy 1
1 Nami 2
3 Franky 0
4 Sanji 3
5 Zoro 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With