Categorical & Numerical Features - Categorical Target - Scikit Learn - Python

Tags:

I have a data set containing both categorical and numerical columns and my target column is also categorical. I am using Scikit library in Python34. I know that Scikit needs all categorical values to be transformed to numerical values before doing any machine learning approach.

How should I transform my categorical columns to numerical values? I tried a lot of thing but I am getting different errors such as "str" object has no 'numpy.ndarray' object has no attribute 'items'.

Here is an example of my data:
 UserID  LocationID   AmountPaid    ServiceID   Target
 29876      IS345       23.9876      FRDG        JFD
 29877      IS712       135.98       WERS        KOI

My dataset is saved in a CSV file, here is the little code I wrote to give you an idea about what I want to do:

#reading my csv file
data_dir = 'C:/Users/davtalab/Desktop/data/'
train_file = data_dir + 'train.csv'
train = pd.read_csv( train_file )

#numeric columns:
x_numeric_cols = train['AmountPaid']

#Categrical columns:
categorical_cols = ['UserID' + 'LocationID' + 'ServiceID']
x_cat_cols = train[categorical_cols].as_matrix() 


y_target = train['Target'].as_matrix()

I need x_cat_cols to be converted to numeric values and the add them to x_numeric_cols and so have my complete input (x) values.

Then I need to convert my target function into numeric value as well and make that as my final target (y) column.

Then I want to do a Random Forest using these two complete sets as:

rf = RF(n_estimators=n_trees,max_features=max_features,verbose =verbose, n_jobs =n_jobs)
rf.fit( x_train, y_train )

Thanks for your help!

446

asked May 16 '15 02:05

USC.Trojan

1 Answers

For target, you can use sklearn's LabelEncoder. This will give you a converter from string labels to numeric ones (and also a reverse mapping). Example in the link.

As for features, learning algorithms in general expect (or work best with) ordinal data. So the best option is to use OneHotEncoder to convert the categorical features. This will generate a new binary feature for each category, denoting on/off for each category. Again, usage example in the link.

answered Sep 30 '22 05:09

Ando Saabas

Related questions
                            
                                NOT NULL constraint failed error
                            
                                Getting the basic form of the english word
                            
                                cx_freeze - including my own modules?
                            
                                Is it possible to monitor a list (or mutable sequence) for when a member of the list is modified?
                            
                                Matplotlib custom projection: How to transform points
                            
                                Is there a Python equivalent to dereferencing in Perl?
                            
                                How do I find the largest integer less than x?
                            
                                How to declare build-time dependencies without breaking other packages?
                            
                                Change Time Unit with Kernprof
                            
                                What happens to a Celery Worker's scheduled (eta) tasks when it shuts down?
                            
                                Can I have logging.ini file without root logger?
                            
                                python module __init__ function
                            
                                How can I get Sqlalchemy to preserve column order in the sql it generates?
                            
                                Python - Scikit find variable importance for categorical variables
                            
                                Wrap C++ Class with cython, getting the basic example to work
                            
                                How get a (x,y) position pointing with mouse in a interactive plot (Python)?
                            
                                urllib2.URLError: <urlopen error Tunnel connection failed: 403 Tunnel or SSL Forbidden>
                            
                                How to check if GPU memory is available using PyOpenCL
                            
                                Computing MAD(mean absolute deviation) GroupBy Pandas
                            
                                How to know if threading.Condition.wait(timeout) has timed out or has been notified?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Categorical & Numerical Features - Categorical Target - Scikit Learn - Python

Tags:

python

target

scikit-learn

categorical-data

numerical

USC.Trojan

People also ask

1 Answers

Ando Saabas

Recent Activity

Donate For Us