Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Support for PMML

I am new in PMML: Predictive Model Markup Language (www.dmg.org) and I was wondering if there is some kind of Java support (Open Source / professional) for creating/parsing PMML files.

Initially I only have in mind the possibility of creating/parsing PMML files programatically from Java environments.

I have been "googling" and I have found several possibilities:

Open source:

  • jpmml. (PMML 3.2).

From Java.

  • JDM. javax.datamining. Seems it a dead ? Someone has more info?

Professional.

  • Zementis (http://www.zementis.com/pmml_tools.htm).

DIY

  • Use an XML Java library and build yourself a parser/writer of PMML files

I appreciate all your opinions.

Thanks in advance

Oscar

like image 934
Oscar Avatar asked Sep 02 '11 08:09

Oscar


People also ask

How do I open a PMML file?

In the models palette, right-click the palette and select Import PMML from the menu. Select the file to import and specify options for variable labels as required. Click Open.

What is PMML format?

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr. Robert Lee Grossman, then the director of the National Center for Data Mining at the University of Illinois at Chicago.

What is Jpmml evaluator?

JPMML-Evaluator is de facto the reference implementation of the PMML specification versions 3.0, 3.1, 3.2, 4.0, 4.1, 4.2, 4.3 and 4.4 for the Java/JVM platform: Pre-processing of input fields according to the DataDictionary and MiningSchema elements: Complete data type system.


1 Answers

You should realize that the answer may depend on the MODEL-ELEMENT that you want to work with. It is also very likely that your best options for creating PMML and parsing PMML will come from different software packages. I am going to assume that by 'creation of PMML' you mean of the document and not of the model. I've never heard of anyone integrating automatic model fitting with execution but perhaps it exists already. Certainly a PMML model could be passed using SOAP.

I can't speak to the other projects but the product offered by Zementis, called Adapa, is used only for the execution of PMML. This product assumes that there is a model fitting application that will do the creating by exporting a fitted model into PMML. There are already a lot of well developed model fitting applications so I think this is a reasonable assumption.

The version I have used (3.6) was generally fast but it couldn't handle ensembles of typical random forest size (500+ trees) without an especially large heap. I think they may have fixed this in newer versions. Though it isn't advertised, Zementis doesn't appear to offer a few of the models, namely Text Models, Sequences, Baseline Models, or Time Series (for which the PMML standard currently only has Exponential Smoothing anyway). My version also doesn't have K-Nearest Neighbors but I hear that more recent versions do.

Unless you are considering integrated fitting and execution (in which case you should consider online learning) my advise would be to consider these questions in order:

  1. What is the model type that I am interested in using?
  2. What application/s do I prefer to build models in?
  3. Then lastly how will I execute this and what requirements do I have in this regard (web-services, cloud, performance etc)?

If you look at the list of members to the DMG group you will find many commercial vendors that are either on the supply side (eg. SAS, SPSS, Togaware, Rapid-I) or the demand side (so many to list).

On your list you also didn't mention Weka but they also execute some PMML models and there are R/Java based solutions and so you could execute PMML->R imports (see fileToXMLNode) in a Java environment (but you could also just execute R).

Finally, if you have a very specific model in mind and you understand what it means mathematically to 'execute it' then it shouldn't be too difficult to build what you need yourself.

like image 159
Meadowlark Bradsher Avatar answered Oct 09 '22 04:10

Meadowlark Bradsher