I have locally trained a sklearn classifier and I have to create a simple web application that demonstrate its use. I'm a complete noob on web app development and I don't want to waste hours on creating a web app using a framework that doesn't support the modules I'm using.
Heroku
, django
etc. or is there more simple and quicker solutions for a simple scientific demo?My thought was to take the classifier I trained, pickle it and un-pickle it on the server, then to run classify
from the server, but I'm not sure where to begin.
Scikit-learn is an indispensable part of the Python machine learning toolkit at JPMorgan. It is very widely used across all parts of the bank for classification, predictive analytics, and very many other machine learning tasks.
Sklearn is an open source library which uses the BSD license. It is widely used in industry as well as in academia. It is built on Numpy, Scipy and Matplotlib while also having wrappers around various popular libraries such LIBSVM. Sklearn can be used “out of the box” after installation.
PyTorch vs Scikit-Learn However, while Sklearn is mostly used for machine learning, PyTorch is designed for deep learning. Sklearn is good for defining algorithms, but cannot really be used for end-to-end training of deep neural networks. Ease of Use: Undoubtedly Sklearn is easier to use than PyTorch.
scikit-learn and sklearn both refer to the same package however, there are a couple of things you need to be aware of. Firstly, you can install the package by using either of scikit-learn or sklearn identifiers however, it is recommended to install scikit-learn through pip using the skikit -learn identifier.
If this is just for a demo, train your classifier offline, pickle the model and then use a simple python web framework such as flask or bottle to unpickle the model at server startup time and call the predict function in an HTTP request handler.
django is a feature complete framework hence is longer to learn than flask or bottle but it has a great documentation and a larger community.
heroku is a service to host your application in the cloud. It's possible to host flask applications on heroku, here is a simple template project + instructions to do so.
For "production" setups I would advise you not to use pickle but to write your own persistence layer for the machine learning model so as to have full control on the parameters your store and be more robust to library upgrades that might break the unpickling of old models.
While this is not a classifier, I have implemented a simple machine learning web service using the bottle framework and scikit-learn. Given a dataset in .csv format it returns 2D visualizations with respect to principal components analysis and linear discriminant analysis techniques.
More information and example data files can be found at: http://mindwriting.org/blog/?p=153
Here is the implementation: upload.html:
<form
action="/plot" method="post"
enctype="multipart/form-data"
>
Select a file: <input type="file" name="upload" />
<input type="submit" value="PCA & LDA" />
</form>
pca_lda_viz.py (modify host name and port number):
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
from cStringIO import StringIO
from bottle import route, run, request, static_file
import csv
from matplotlib.font_manager import FontProperties
import colorsys
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.lda import LDA
html = '''
<html>
<body>
<img src="data:image/png;base64,{}" />
</body>
</html>
'''
@route('/')
def root():
return static_file('upload.html', root='.')
@route('/plot', method='POST')
def plot():
# Get the data
upload = request.files.get('upload')
mydata = list(csv.reader(upload.file, delimiter=','))
x = [row[0:-1] for row in mydata[1:len(mydata)]]
classes = [row[len(row)-1] for row in mydata[1:len(mydata)]]
labels = list(set(classes))
labels.sort()
classIndices = np.array([labels.index(myclass) for myclass in classes])
X = np.array(x).astype('float')
y = classIndices
target_names = labels
#Apply dimensionality reduction
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)
lda = LDA(n_components=2)
X_r2 = lda.fit(X, y).transform(X)
#Create 2D visualizations
fig = plt.figure()
ax=fig.add_subplot(1, 2, 1)
bx=fig.add_subplot(1, 2, 2)
fontP = FontProperties()
fontP.set_size('small')
colors = np.random.rand(len(labels),3)
for c,i, target_name in zip(colors,range(len(labels)), target_names):
ax.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c,
label=target_name,cmap=plt.cm.coolwarm)
ax.legend(loc='upper center', bbox_to_anchor=(1.05, -0.05),
fancybox=True,shadow=True, ncol=len(labels),prop=fontP)
ax.set_title('PCA')
ax.tick_params(axis='both', which='major', labelsize=6)
for c,i, target_name in zip(colors,range(len(labels)), target_names):
bx.scatter(X_r2[y == i, 0], X_r2[y == i, 1], c=c,
label=target_name,cmap=plt.cm.coolwarm)
bx.set_title('LDA');
bx.tick_params(axis='both', which='major', labelsize=6)
# Encode image to png in base64
io = StringIO()
fig.savefig(io, format='png')
data = io.getvalue().encode('base64')
return html.format(data)
run(host='mindwriting.org', port=8079, debug=True)
You can follow the tutorial below to deploy your scikit-learn model in Azure ML and get the web service automatically generated:
Build and Deploy a Predictive Web App Using Python and Azure ML
or the combination of yHat + Heroku may also do the trick
I'm working on a Docker image that wraps predict
and predictproba
methods and expose them as a web api: https://github.com/hexacta/docker-sklearn-predict-http-api
You need to save your model:
from sklearn.externals import joblib
joblib.dump(clf, 'iris-svc.pkl')
create a Dockerfile:
FROM hexacta/sklearn-predict-http-api:latest
COPY iris-svc.pkl /usr/src/app/model.pkl
and run the container:
$ docker build -t iris-svc .
$ docker run -d -p 4000:8080 iris-svc
then you can make requests:
$ curl -H "Content-Type: application/json" -X POST -d '{"sepal length (cm)":4.4}' http://localhost:4000/predictproba
[{"0":0.8284069169,"1":0.1077571623,"2":0.0638359208}]
$ curl -H "Content-Type: application/json" -X POST -d '[{"sepal length (cm)":4.4}, {"sepal length (cm)":15}]' http://localhost:4000/predict
[0, 2]
You can use Plotly Dash for a demo or even for an app with limited scope.
https://dash-gallery.plotly.host/Portal/ for some examples with code source. You have machine learning examples with sklearn.
https://dash.plotly.com/deployment for deployment, mainly with Heroku.
If you go the flask route, I highly recommend that you watch the Corey Shafer series on Youtube. It's a solid series that will get you underway quickly, and there are many helpful notes from other viewers in the comment section.
Additionally, since I presume you'll build your models elsewhere and look to score them on your site, you will likely want to use pickle to store the model objects after development, and then load the model objects using pickle within your flask config.py
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With