Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I call scikit-learn classifiers from Java?

I have a classifier that I trained using Python's scikit-learn. How can I use the classifier from a Java program? Can I use Jython? Is there some way to save the classifier in Python and load it in Java? Is there some other way to use it?

like image 421
Thomas Johnson Avatar asked Oct 05 '12 02:10

Thomas Johnson


People also ask

What are the classifiers in Sklearn?

Decision Tree Classifiers/Random Forests. Naive Bayes. Linear Discriminant Analysis. Logistic Regression.

Do companies use scikit-learn?

J.P.Morgan. Scikit-learn is an indispensable part of the Python machine learning toolkit at JPMorgan. It is very widely used across all parts of the bank for classification, predictive analytics, and very many other machine learning tasks.

Can scikit-learn be used for deep learning?

The scikit-learn library in Python is built upon the SciPy stack for efficient numerical computation. It is a fully featured library for general machine learning and provides many useful utilities in developing deep learning models.


2 Answers

You cannot use jython as scikit-learn heavily relies on numpy and scipy that have many compiled C and Fortran extensions hence cannot work in jython.

The easiest ways to use scikit-learn in a java environment would be to:

  • expose the classifier as a HTTP / Json service, for instance using a microframework such as flask or bottle or cornice and call it from java using an HTTP client library

  • write a commandline wrapper application in python that reads data on stdin and output predictions on stdout using some format such as CSV or JSON (or some lower level binary representation) and call the python program from java for instance using Apache Commons Exec.

  • make the python program output the raw numerical parameters learnt at fit time (typically as an array of floating point values) and reimplement the predict function in java (this is typically easy for predictive linear models where the prediction is often just a thresholded dot product).

The last approach will be a lot more work if you need to re-implement feature extraction in Java as well.

Finally you can use a Java library such as Weka or Mahout that implement the algorithms you need instead of trying to use scikit-learn from Java.

like image 90
ogrisel Avatar answered Sep 22 '22 08:09

ogrisel


There is JPMML project for this purpose.

First, you can serialize scikit-learn model to PMML (which is XML internally) using sklearn2pmml library directly from python or dump it in python first and convert using jpmml-sklearn in java or from a command line provided by this library. Next, you can load pmml file, deserialize and execute loaded model using jpmml-evaluator in your Java code.

This way works with not all scikit-learn models, but with many of them.

like image 39
Dmitry Spikhalskiy Avatar answered Sep 22 '22 08:09

Dmitry Spikhalskiy