I use Spark 2.1.0.
I've been trying to export Spark-MLlib Linear Regression model as PMML file. I've also successfully exported the PMML file. But in that file, I couldn't see any field name in it. All I can see is like this,
Can anyone let me know what's the reason for this? Also, please let me know how to obtain the column names in place of that.
There are two approaches to exporting Apache Spark models into PMML data format. First, when working at Spark ML abstraction level, then you can use the JPMML-SparkML library. Second, when working at Spark MLlib abstraction level, which appears to be the case here, then you can use the built-in PMMLExportable
trait.
JPMML-SparkML retrieves column names from the Spark ML data schema via DataFrame#schema()
. Unfortunately, there is no such option for Spark MLlib, so feature names "field_{n}" and the label name "target" are simply dummy hard-coded names.
It is fairly easy to rename fields in the PMML document using the JPMML-Model library:
pmmlExportable.toPMML("/tmp/raw-pmml-file")
org.dmg.pmml.PMML pmml = org.jpmml.model.JAXBUtil.unmarshal("/tmp/raw-pmml-file");
org.jpmml.model.visitors.FieldRenamer targetRenamer = new FieldRenamer(FieldName.create("target"), FieldRenamer.create("y"));
targetRenamer.applyTo(pmml);
org.jpmml.model.JAXBUtil.marshal(pmml, "/tmp/final-pmml-file");
If you marshal this PMML object instance to a PMML file, then you can see that the field "target" (and all its references) has been renamed to "y". Repeat the procedure with features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With