I'm new to data analytics. I'm trying some models in python Sklearn. I have a dataset in which some of the columns have text columns. Like below,
Dataset
Is there a way to convert these column values into numbers in pandas or Sklearn?. Assigning numbers to these values will be right?. And what if a new string pops out in test data?.
Please advice.
Consider using Label Encoding - it transforms the categorical data by assigning each category an integer between 0 and the num_of_categories-1:
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame(['a','b','c','d','a','c','a','d'], columns=['letter'])
letter
0 a
1 b
2 c
3 d
4 a
5 c
6 a
Applying:
le = LabelEncoder()
encoded_series = df[df.columns[:]].apply(le.fit_transform)
encoded_series:
letter
0 0
1 1
2 2
3 3
4 0
5 2
6 0
7 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With