Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Categorical & Numerical Features - Categorical Target - Scikit Learn - Python

I have a data set containing both categorical and numerical columns and my target column is also categorical. I am using Scikit library in Python34. I know that Scikit needs all categorical values to be transformed to numerical values before doing any machine learning approach.

How should I transform my categorical columns to numerical values? I tried a lot of thing but I am getting different errors such as "str" object has no 'numpy.ndarray' object has no attribute 'items'.

Here is an example of my data:
 UserID  LocationID   AmountPaid    ServiceID   Target
 29876      IS345       23.9876      FRDG        JFD
 29877      IS712       135.98       WERS        KOI

My dataset is saved in a CSV file, here is the little code I wrote to give you an idea about what I want to do:

#reading my csv file
data_dir = 'C:/Users/davtalab/Desktop/data/'
train_file = data_dir + 'train.csv'
train = pd.read_csv( train_file )

#numeric columns:
x_numeric_cols = train['AmountPaid']

#Categrical columns:
categorical_cols = ['UserID' + 'LocationID' + 'ServiceID']
x_cat_cols = train[categorical_cols].as_matrix() 


y_target = train['Target'].as_matrix() 

I need x_cat_cols to be converted to numeric values and the add them to x_numeric_cols and so have my complete input (x) values.

Then I need to convert my target function into numeric value as well and make that as my final target (y) column.

Then I want to do a Random Forest using these two complete sets as:

rf = RF(n_estimators=n_trees,max_features=max_features,verbose =verbose, n_jobs =n_jobs)
rf.fit( x_train, y_train )

Thanks for your help!

like image 446
USC.Trojan Avatar asked May 16 '15 02:05

USC.Trojan


People also ask

What mean by categorical?

Definition of categorical 1 : absolute, unqualified a categorical denial. 2a : of, relating to, or constituting a category. b : involving, according with, or considered with respect to specific categories a categorical system for classifying books.

What is categorical example?

Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level.

What is the meaning of categorical in research?

Categorical variables are sets of variables with values assigned to distinct and limited groups or categories. Categorical variables take on values in a set of categories, different from a continuous variable, which takes on a range of values. Categorical variables are also called discrete or nominal variables.

Which is a categorical variable?

A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories.


1 Answers

For target, you can use sklearn's LabelEncoder. This will give you a converter from string labels to numeric ones (and also a reverse mapping). Example in the link.

As for features, learning algorithms in general expect (or work best with) ordinal data. So the best option is to use OneHotEncoder to convert the categorical features. This will generate a new binary feature for each category, denoting on/off for each category. Again, usage example in the link.

like image 93
Ando Saabas Avatar answered Sep 30 '22 05:09

Ando Saabas