Looking at Kaggel's Job Salary Prediction, I see numeric features (like Category) and textual ones (like FullDescription).
How do I go about training on such data? I thought about vectorizing the text using TfidfTransformer, however it creates sparse matrix which many learning algorithms (such as RandomForestRegressor) refuse to work with. Also, once I have the feature vector for the text, how do I combine it with other features?
Any pointers on how to work with such data?
Thanks!
I would first learn a linear model on the tf-idf features of each text field independently and add the linear models predictions as a additional feature to the other features and train an ExtraTreesRegressor
or GradientBoostedTreeRegressor
on the combined features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With