Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Pandas Dataframe to numpy for sklearn

I am new to python and sklearn. I have a pandas data frame of titanic dataset. I want it to use for sklearn logistic prediction.

I tried the following

data_np = data.astype(np.int32).values

But not working. I want to make use of different features in the dataset like 'Pclass', "Age", 'Sex' etc ...

I want to convert the entire data , as well as single columns say data["Age"] to sklearn numpy format . Any help .

like image 833
Seja Nair Avatar asked Jan 08 '23 14:01

Seja Nair


1 Answers

Categorical variables like 'Sex' and 'Embarked' need to be one-hot-encoded to be able to use them in a LogisticRegression model. With pandas you can use the get_dummies(data['Sex']).

There is a full tutorial that covers specifically this issue on the same dataset here:

http://nbviewer.ipython.org/github/ogrisel/parallel_ml_tutorial/blob/master/rendered_notebooks/04%20-%20Pandas%20and%20Heterogeneous%20Data%20Modeling.ipynb

like image 50
ogrisel Avatar answered Jan 14 '23 20:01

ogrisel