If using a library like scikit-learn, how do I assign more weight on certain features in the input to a classifier like SVM? Is this something people do or is there another solution to my problem?
The weighted moving average (WMA) is a technical indicator that assigns a greater weighting to the most recent data points, and less weighting to data points in the distant past. The WMA is obtained by multiplying each number in the data set by a predetermined weight and summing up the resulting values.
The strategy used in this work is feature weighting, which seeks to estimate the relative importance of each feature (with respect to the classification task), and assign it a cor- responding weight. When properly weighted, an important feature would receive a larger weight than less important or irrelevant features.
Feature weighting is an important phase of text categorization, which computes the feature weight for each feature of documents. This paper proposes three new feature weighting methods for text categorization.
These weights can be used to calculate the weighted average by multiplying each prediction by the model's weight to give a weighted sum, then dividing the value by the sum of the weights. For example: yhat = ((97.2 * 0.84) + (100.0 * 0.87) + (95.8 * 0.75)) / (0.84 + 0.87 + 0.75)
First of all - you should probably not do it. The whole concept of machine learning is to use statistical analysis to assign optimal weights. You are interfering here with the whole concept, thus you need really strong evidence that this is crucial to the process you are trying to model, and for some reason your model is currently missing it.
That being said - there is no general answer. This is purely model specific, some of which will allow you to weight features - in random forest you could bias distribution from which you sample features to analyse towards the ones that you are interested in; in SVM it should be enough to just multiply given feature by a constant - remember when you were told to normalize your features in SVM? This is why - you can use the scale of features to 'steer' your classifier towards given features. The ones with high values will be preffered. This will actually work for any weight norm-regularized model (regularized logistic regression, ridge regression, lasso etc.).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With