Export python scikit learn models into pmml

3 Answers

SkLearn2PMML is

a thin wrapper around the JPMML-SkLearn command-line application. For a list of supported Scikit-Learn Estimator and Transformer types, please refer to the documentation of the JPMML-SkLearn project.

As @user1808924 notes, it supports Python 2.7 or 3.4+. It also requires Java 1.7+

Installed via: (requires git)

pip install git+https://github.com/jpmml/sklearn2pmml.git

Example of how export a classifier tree to PMML. First grow the tree:

# example tree & viz from http://scikit-learn.org/stable/modules/tree.html
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier() 
clf = clf.fit(iris.data, iris.target)

There are two parts to an SkLearn2PMML conversion, an estimator (our clf) and a mapper (for preprocessing steps such as discretization or PCA). Our mapper is pretty basic, since we are not doing any transformations.

from sklearn_pandas import DataFrameMapper
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])

from sklearn2pmml import sklearn2pmml
sklearn2pmml(estimator=clf, 
             mapper=default_mapper, 
             pmml="D:/workspace/IrisClassificationTree.pmml")

It is possible (though not documented) to pass mapper=None, but you will see that the predictor names get lost (returning x1 not sepal length etc.).

Let's look at the .pmml file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
    <Header>
        <Application name="JPMML-SkLearn" version="1.1.1"/>
        <Timestamp>2016-09-26T19:21:43Z</Timestamp>
    </Header>
    <DataDictionary>
        <DataField name="sepal length (cm)" optype="continuous" dataType="float"/>
        <DataField name="sepal width (cm)" optype="continuous" dataType="float"/>
        <DataField name="petal length (cm)" optype="continuous" dataType="float"/>
        <DataField name="petal width (cm)" optype="continuous" dataType="float"/>
        <DataField name="Species" optype="categorical" dataType="string">
            <Value value="setosa"/>
            <Value value="versicolor"/>
            <Value value="virginica"/>
        </DataField>
    </DataDictionary>
    <TreeModel functionName="classification" splitCharacteristic="binarySplit">
        <MiningSchema>
            <MiningField name="Species" usageType="target"/>
            <MiningField name="sepal length (cm)"/>
            <MiningField name="sepal width (cm)"/>
            <MiningField name="petal length (cm)"/>
            <MiningField name="petal width (cm)"/>
        </MiningSchema>
        <Output>
            <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/>
            <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/>
            <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/>
        </Output>
        <Node id="1">
            <True/>
            <Node id="2" score="setosa" recordCount="50.0">
                <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/>
                <ScoreDistribution value="setosa" recordCount="50.0"/>
                <ScoreDistribution value="versicolor" recordCount="0.0"/>
                <ScoreDistribution value="virginica" recordCount="0.0"/>
            </Node>
            <Node id="3">
                <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/>
                <Node id="4">
                    <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/>
                    <Node id="5">
                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/>
                        <Node id="6" score="versicolor" recordCount="47.0">
                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="47.0"/>
                            <ScoreDistribution value="virginica" recordCount="0.0"/>
                        </Node>
                        <Node id="7" score="virginica" recordCount="1.0">
                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="1.0"/>
                        </Node>
                    </Node>
                    <Node id="8">
                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/>
                        <Node id="9" score="virginica" recordCount="3.0">
                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="3.0"/>
                        </Node>
                        <Node id="10">
                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/>
                            <Node id="11" score="versicolor" recordCount="2.0">
                                <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/>
                                <ScoreDistribution value="setosa" recordCount="0.0"/>
                                <ScoreDistribution value="versicolor" recordCount="2.0"/>
                                <ScoreDistribution value="virginica" recordCount="0.0"/>
                            </Node>
                            <Node id="12" score="virginica" recordCount="1.0">
                                <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/>
                                <ScoreDistribution value="setosa" recordCount="0.0"/>
                                <ScoreDistribution value="versicolor" recordCount="0.0"/>
                                <ScoreDistribution value="virginica" recordCount="1.0"/>
                            </Node>
                        </Node>
                    </Node>
                </Node>
                <Node id="13">
                    <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/>
                    <Node id="14">
                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/>
                        <Node id="15" score="virginica" recordCount="2.0">
                            <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="2.0"/>
                        </Node>
                        <Node id="16" score="versicolor" recordCount="1.0">
                            <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="1.0"/>
                            <ScoreDistribution value="virginica" recordCount="0.0"/>
                        </Node>
                    </Node>
                    <Node id="17" score="virginica" recordCount="43.0">
                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/>
                        <ScoreDistribution value="setosa" recordCount="0.0"/>
                        <ScoreDistribution value="versicolor" recordCount="0.0"/>
                        <ScoreDistribution value="virginica" recordCount="43.0"/>
                    </Node>
                </Node>
            </Node>
        </Node>
    </TreeModel>
</PMML>

The first split (Node 1) is on petal width at 0.8. Node 2 (petal width <= 0.8) captures all of the setosa, with nothing else.

You can compare the pmml output to the graphviz output:

from sklearn.externals.six import StringIO
import pydotplus # this might be pydot for python 2.7
dot_data = StringIO() 
tree.export_graphviz(clf, 
                     out_file=dot_data,  
                     feature_names=iris.feature_names,  
                     class_names=iris.target_names,  
                     filled=True, rounded=True,  
                     special_characters=True) 
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("D:/workspace/iris.pdf") 
# for in-line display, you can also do:
# from IPython.display import Image  
# Image(graph.create_png())

enter image description here

answered Oct 19 '22 17:10

C8H10N4O2

Feel free to try Nyoka. Exports SKL models and then some.

answered Oct 19 '22 17:10

dr. rAI

Nyoka is a python library having support for Scikit-learn, XGBoost, LightGBM, Keras and Statsmodels.

Besides about 500 Python classes which each cover a PMML tag and all constructor parameters/attributes as defined in the standard, Nyoka also provides an increasing number of convenience classes and functions that make the Data Scientist’s life easier for example by reading or writing any PMML file in one line of code from within your favorite Python environment.

It can be installed from PyPi using :

pip install nyoka

Example code

Example 1

import pandas as pd
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn_pandas import DataFrameMapper
from sklearn.ensemble import RandomForestClassifier

iris = datasets.load_iris()
irisd = pd.DataFrame(iris.data, columns=iris.feature_names)
irisd['Species'] = iris.target

features = irisd.columns.drop('Species')
target = 'Species'

pipeline_obj = Pipeline([
    ("mapping", DataFrameMapper([
    (['sepal length (cm)', 'sepal width (cm)'], StandardScaler()) , 
    (['petal length (cm)', 'petal width (cm)'], Imputer())
    ])),
    ("rfc", RandomForestClassifier(n_estimators = 100))
])

pipeline_obj.fit(irisd[features], irisd[target])

from nyoka import skl_to_pmml

skl_to_pmml(pipeline_obj, features, target, "rf_pmml.pmml")

Example 2

from keras import applications
from keras.layers import Flatten, Dense
from keras.models import Model

model = applications.MobileNet(weights='imagenet', include_top=False,input_shape = (224, 224,3))

activType='sigmoid'
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(2, activation=activType)(x)
model_final = Model(inputs =model.input, outputs = predictions,name='predictions')

from nyoka import KerasToPmml
cnn_pmml = KerasToPmml(model_final,dataSet='image',predictedClasses=['cats','dogs'])

cnn_pmml.export(open('2classMBNet.pmml', "w"), 0)

More examples can be found in Nyoka's Github Page .

answered Oct 19 '22 17:10

Nirmal

Related questions
                            
                                How to get geom_smooth() ignore my colour grouping
                            
                                How to do basic dependency injection in Python (for mocking/testing purposes)
                            
                                "control reaches end of non-void function" with fully handled case switch over an enum type
                            
                                Map and Reduce JSON Objects with JavaScript
                            
                                Does adding a new dependency to a library, with compatible API changes, affect binary compatibility?
                            
                                Pyspark DataFrame UDF on Text Column
                            
                                What does "CC Graph Reduction" in Firefox Performance tool mean?
                            
                                WebStorm becomes very slow for any typescript files
                            
                                Managing the product count in the database
                            
                                React.js and jsx over ejs?
                            
                                What is a good explanation of how to read the histogram feature of TensorBoard?
                            
                                How do we call a normal function where a coroutine is expected?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Export python scikit learn models into pmml

Tags:

python

scikit-learn

pmml

Selva

People also ask

3 Answers

C8H10N4O2

dr. rAI

Example code

Example 1

Example 2

Nirmal

Recent Activity

Donate For Us