Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert categorical variables from String to int representation

I have a numpy array of classification of text in the form of String array, i.e. y_train = ['A', 'B', 'A', 'C',...]. I am trying to apply SKlearn multinomial NB algorithm to predict classes for entire dataset.

I want to convert the String classes into integers to be able to input into the algorithm and convert ['A', 'B', 'A', 'C', ...] into ['1', '2', '1', '3', ...]

I can write a for loop to go through array and create a new one with int classifiers but is there a direct function to achieve this

like image 279
Abhi Avatar asked Dec 10 '16 16:12

Abhi


1 Answers

Another way is use the astype('category').cat.codes of the dataframe to convert the string values into number

X=df[['User ID', 'Gender', 'Age', 'EstimatedSalary']]
X['Gender']=X['Gender'].astype('category').cat.codes
like image 192
Golden Lion Avatar answered Sep 23 '22 00:09

Golden Lion