I was trying to use NaiveBayesUpdateable classifier from Weka. My data contains both nominal and numeric attributes:
@relation cars
@attribute country {FR, UK, ...}
@attribute city {London, Paris, ...}
@attribute car_make {Toyota, BMW, ...}
@attribute price numeric %% car price
@attribute sales numeric %% number of cars sold
I need to predict the number of sales (numeric!) based on other attributes.
I understand that I can not use numeric attribute for Bayes classification in Weka. One technique is to split value of numeric attribute in N intervals of length k and use instead nominal attribute, where n is a class name, like this: @attribute class {1,2,3,...N}.
Yet numeric attribute that I need to predict ranges from 0 to 1 000 000. Creating 1 000 000 classes make no sense at all. How to predict numeric attribute with Weka or what algorithms to look for in case Weka has no tools for this task?
What you want to do is regression, not classification. The difference is exactly what you describe/want:
Most regression based techniques can be transformed into a binary classification by defining a threshold and the class is determined by whether the predicted value is above or below this threshold.
I don't know all of WEKA's classifiers that offer regression, but you can start by looking at those two:
You might have to use the NominalToBinary
filter to convert your nominal attributes to numerical (binary) ones.
These days, I believe first introduced in Weka 3.7, RandomForest would work just as you want it. The features can be a mix of nominal and numeric and the prediction is allowed to be numeric as well.
The drawback (I would imagine in your case) is that it is not an Updateable class as NaiveBayesUpdateable works well with large amounts of data that may not fit in memory all at once.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With