Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding Column variable with string values by integer values in Python or Sklearn

How can I encode the column values of string types in the data table by integer values. For example I have two feature variables: color (possible string values R, G and B) and skills ( with possible string values C++ , Java, SQL and Python). Given Data-table has two columns-

Color' -> R G B B G R B G G R G  ;
Skills' -> Java , C++, SQL, Java, Python, Python, SQL, C++, Java, SQL, Java.

I want to know which sklearn function/method will transform above two columns as with R=0, G=1 and B=2 and with C++ =0, Java=1, SQL=2 and Python=3 :

Color: 0, 1, 2, 2, 1, 0, 2, 1, 1, 0, 1
Skills:  1, 0, 2, 1, 3, 3, 2, 0, 1, 2, 1

Kindly, let me know how to do this ??

like image 410
Chandra Avatar asked Oct 15 '25 21:10

Chandra


1 Answers

Use Sckit-learn LabelEncoder() method

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.DataFrame({
'colors':  ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
'skills':  ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
})

def encode_df(dataframe):
    le = LabelEncoder()
    for column in dataframe.columns:
        dataframe[column] = le.fit_transform(dataframe[column])
    return dataframe

#encode the dataframe
encode_df(df)
like image 83
Mir Ilias Avatar answered Oct 17 '25 12:10

Mir Ilias