I would like to use the output model of a Mahout decision tree training process as the input model for a Weka based classifier.
As the training of a complex decision tree that is based on millions of training records is almost impractical for a single node Weka classifier, I would like to use Mahout to build the model, using for example Random Forest Partial Implementation.
While the algorithm above can be problematic while training, it is rather simple to use it for prediction with Weka on a single machine.
On Mahout wiki site it is stated that the data formats for import include Weka ARFF format, but not for export.
Is it possible to use some of the existing implementations in Mahout to train models that will be used in production with a simple Weka based system?
I don't think it's possible to do what you're asking: .arff is a data format, as are all of the other options in the import/export menus. The classifiers that Weka can save/load are, in fact, Weka's java Classifier objects written to a file using Java's Serializable interface. They're not so much portable trees as they are Java objects that last longer than the JVMs which create them. Thus, to do what you want, either Mahout or Weka would have to be able to produce/read each other's code, and that's not something I can find any documentation of.
My experience is that with several million training records (consisting of ~45 numeric features/columns each), Weka's Random Forest implementation using the default options is very fast (operating in seconds on a single 2.26GHz core), so it may not be necessary to bother with Mahout. Your data set may well have different results, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With