I am new to python and sklearn. I have a pandas data frame of titanic dataset. I want it to use for sklearn logistic prediction.
I tried the following
data_np = data.astype(np.int32).values
But not working. I want to make use of different features in the dataset like 'Pclass', "Age", 'Sex' etc ...
I want to convert the entire data , as well as single columns say data["Age"] to sklearn numpy format . Any help .
Categorical variables like 'Sex' and 'Embarked' need to be one-hot-encoded to be able to use them in a LogisticRegression
model. With pandas you can use the get_dummies(data['Sex'])
.
There is a full tutorial that covers specifically this issue on the same dataset here:
http://nbviewer.ipython.org/github/ogrisel/parallel_ml_tutorial/blob/master/rendered_notebooks/04%20-%20Pandas%20and%20Heterogeneous%20Data%20Modeling.ipynb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With