I have historical purchase data for some 10k customers for 3 months, I want to use that data for making predictions about their purchase in next 3 months. I am using Customer ID as input variable, as I want xgboost to learn for individual spendings among different categories. Is there a way to tweak, so that emphasis is to learn more based on the each Individual purchase? Or better way of addressing this problem?
You can use weight vector which you can pass in weight
argument in xgboost; a vector of size equal to nrow(trainingData)
. However This is generally used to penalize mistake in classification mistake (think of sparse data with items which just sale say once in month or so; you want to learn the sales then you need to give more weight to sales instance or else all prediction will be zero). Apparently you are trying to tweak
weight of independent variable which I am not able to understand well.
Learning the behavior of dependent variable (sales in your case) is what machine learning model do, you should let it do its job. You should not tweak it to force learn from some feature only. For learning purchase behavior clustering type of unsupervised techniques will be more useful.
To include user specific behavior first take will be to do clustering and identify under-indexed and over-indexed categories for each user. Then you can create some categorical feature using these flags.
PS: Some data to explain your problem can help others to help you better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With