Knime has generated for me a PMML model. At this time I want to apply this model to a python process. What is the right way to do this?
More in depth: I develop a django student attendance system. The application is already so mature that I have time to implement the 'I'm feeling lucky' button to automatically fill an attendance form. Here is where PMML comes in. Knime has generated a PMML model that predicts student attendance. Also, thanks to django for being so productive that I time for this great work ;)
Start by creating a new predictive model. Give your model a name and in the 'Create model' section, click Import PMML. Click Choose File and select the model file to upload.
To import a model saved as PMML See the topic Model types supporting PMML for more information. In the models palette, right-click the palette and select Import PMML from the menu. Select the file to import and specify options for variable labels as required. Click Open.
PMML stands for Predictive Model Markup Language. It is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms.
model. predict() : given a trained model, predict the label of a new set of data. This method accepts one argument, the new data X_new (e.g. model. predict(X_new) ), and returns the learned label for each object in the array.
Finally I have wrote my own code. Be free to contribute or fork it:
https://github.com/ctrl-alt-d/lightpmmlpredictor
The code for Augustus, to score PMML models in Python, is at https://code.google.com/p/augustus/
You could use PyPMML to apply PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.
If you use PMML in PySpark, you could use PyPMML-Spark, for example:
from pypmml_spark import ScoreModel
model = ScoreModel.fromFile('the/pmml/file/path')
score_df = model.transform(df)
The df is a DataFrame of PySpark.
For more info about other PMML libraries, be free to see: https://github.com/autodeployai
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With