How to deal with combination of text and numeric features?

Question

Looking at Kaggel's Job Salary Prediction, I see numeric features (like Category) and textual ones (like FullDescription).

How do I go about training on such data? I thought about vectorizing the text using TfidfTransformer, however it creates sparse matrix which many learning algorithms (such as RandomForestRegressor) refuse to work with. Also, once I have the feature vector for the text, how do I combine it with other features?

Any pointers on how to work with such data?

Thanks!

ogrisel · Accepted Answer

I would first learn a linear model on the tf-idf features of each text field independently and add the linear models predictions as a additional feature to the other features and train an ExtraTreesRegressor or GradientBoostedTreeRegressor on the combined features.

How to deal with combination of text and numeric features?

Tags:

python

scikit-learn

lazy1

1 Answers

ogrisel

Recent Activity

Donate For Us

How to deal with combination of text and numeric features?

Tags:

python

scikit-learn

lazy1

1 Answers

ogrisel

Related questions

Recent Activity

Donate For Us