I am new in PMML: Predictive Model Markup Language (www.dmg.org) and I was wondering if there is some kind of Java support (Open Source / professional) for creating/parsing PMML files.
Initially I only have in mind the possibility of creating/parsing PMML files programatically from Java environments.
I have been "googling" and I have found several possibilities:
Open source:
From Java.
Professional.
DIY
I appreciate all your opinions.
Thanks in advance
Oscar
In the models palette, right-click the palette and select Import PMML from the menu. Select the file to import and specify options for variable labels as required. Click Open.
The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago.
JPMML-Evaluator is de facto the reference implementation of the PMML specification versions 3.0, 3.1, 3.2, 4.0, 4.1, 4.2, 4.3 and 4.4 for the Java/JVM platform: Pre-processing of input fields according to the DataDictionary and MiningSchema elements: Complete data type system.
You should realize that the answer may depend on the MODEL-ELEMENT that you want to work with. It is also very likely that your best options for creating PMML and parsing PMML will come from different software packages. I am going to assume that by 'creation of PMML' you mean of the document and not of the model. I've never heard of anyone integrating automatic model fitting with execution but perhaps it exists already. Certainly a PMML model could be passed using SOAP.
I can't speak to the other projects but the product offered by Zementis, called Adapa, is used only for the execution of PMML. This product assumes that there is a model fitting application that will do the creating by exporting a fitted model into PMML. There are already a lot of well developed model fitting applications so I think this is a reasonable assumption.
The version I have used (3.6) was generally fast but it couldn't handle ensembles of typical random forest size (500+ trees) without an especially large heap. I think they may have fixed this in newer versions. Though it isn't advertised, Zementis doesn't appear to offer a few of the models, namely Text Models, Sequences, Baseline Models, or Time Series (for which the PMML standard currently only has Exponential Smoothing anyway). My version also doesn't have K-Nearest Neighbors but I hear that more recent versions do.
Unless you are considering integrated fitting and execution (in which case you should consider online learning) my advise would be to consider these questions in order:
If you look at the list of members to the DMG group you will find many commercial vendors that are either on the supply side (eg. SAS, SPSS, Togaware, Rapid-I) or the demand side (so many to list).
On your list you also didn't mention Weka but they also execute some PMML models and there are R/Java based solutions and so you could execute PMML->R imports (see fileToXMLNode) in a Java environment (but you could also just execute R).
Finally, if you have a very specific model in mind and you understand what it means mathematically to 'execute it' then it shouldn't be too difficult to build what you need yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With