The final outcome of my work should be a Python function that takes a JSON object as the only input and return another JSON object as output. To keep it more specific, I am a data scientist, and the function that I am speaking about, is derived from data and it delivers predictions (in other words, it is a machine learning model).
So, my question is how to deliver this function to the "tech team" that is going to incorporate it into a web-service.
At the moment I face few problems. First, the tech team does not necessarily work in Python environment. So, they cannot just "copy and paste" my function into their code. Second, I want to make sure that my function runs in the same environment as mine. For example, I can imagine that I use some library that the tech team does not have or they have a version that differ from the version that I use.
ADDED
As a possible solution I consider the following. I start a Python process that listen to a socket, accept incoming strings, transforms them into JSON, gives the JSON to the "published" function and returns the output JSON as a string. Does this solution have disadvantages? In other words, is it a good idea to "publish" a Python function as a background process listening to a socket?
In Python, you can save the definitions of functions in a file called a module. It is possible to import module definitions into your program file. We can save our Python functions in their own file, which is a module, then the module is imported to the main program.
The body is a block of statements that will be executed when the function is called. The body of a Python function is defined by indentation in accordance with the off-side rule. This is the same as code blocks associated with a control structure, like an if or while statement.
Here are simple rules to define a function in Python. Function blocks begin with the keyword def followed by the function name and parentheses ( ( ) ). Any input parameters or arguments should be placed within these parentheses. You can also define parameters inside these parentheses.
You have the right idea with using a socket but there are tons of frameworks doing exactly what you want. Like hleggs, I suggest you checkout Flask to build a microservice. This will let the other team post JSON objects in an HTTP request to your flask application and receive JSON objects back. No knowledge of the underlying system or additional requirements required!
Here's a template for a flask app that replies and responds with JSON
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/', methods=['POST'])
def index():
json = request.json
return jsonify(your_function(json))
if __name__=='__main__':
app.run(host='0.0.0.0', port=5000)
Edit: embeded my code directly as per Peter Britain's advice
My understanding of your question boils down to:
How can I share a Python library with the rest of my team, that may not be using Python otherwise?
And how can I make sure my code and its dependencies are what the receiving team will run?
And that the receiving team can install things easily mostly anywhere?
This is a simple question with no straightforward answer... as you just mentioned that this may be integrated in some webservice, but you do not know the actual platform for this service.
You also ask:
As a possible solution I consider the following. I start a Python process that listen to a socket, accept incoming strings, transforms them into JSON, gives the JSON to the "published" function and returns the output JSON as a string. Does this solution have disadvantages? In other words, is it a good idea to "publish" a Python function as a background process listening to a socket?
In the most simple case and for starting I would say no in general. Starting network servers such as an HTTP server (which is built-in Python) is super easy. But a service (even if qualified as "micro") means infrastructure, means security, etc.
Etc, etc. When deployed, my experience with "simple" socket servers is that they end up being not so simple after all.
In most cases, it will be simpler to avoid redistributing a socket service at first. And the proposed approach here could be used to package a whole service at a later stage in a simpler way if you want.
What I suggest instead is a simple command line interface nicely packaged for installation.
The minimal set of things to consider would be:
Step 1. The simplest common denominator would be to provide a command line interface that accepts the path to a JSON file and spits JSON on the stdout. This would run on Linux, Mac and Windows.
The instructions here should work on Linux or Mac and would need a slight adjustment for Windows (only for the configure.sh
script further down)
A minimal Python script could be:
#!/usr/bin/env python
"""
Simple wrapper for calling a function accepting JSON and returning JSON.
Save to predictor.py and use this way::
python predictor.py sample.json
[
"a",
"b",
4
]
"""
from __future__ import absolute_import, print_function
import json
import sys
def predict(json_input):
"""
Return predictions as a JSON string based on the provided `json_input` JSON
string data.
"""
# this will error out immediately if the JSON is not valid
validated = json.loads(json_input)
# <....> your code there
with_predictions = validated
# return a pretty-printed JSON string
return json.dumps(with_predictions, indent=2)
def main():
"""
Print the JSON string results of a prediction, loading an input JSON file from a
file path provided as a command line argument.
"""
args = sys.argv[1:]
json_input = args[0]
with open(json_input) as inp:
print(predict(inp.read()))
if __name__ == '__main__':
main()
You can process eventually large inputs by passing the path to a JSON file.
Step 2. Package your function. In Python this is achieved by creating a setup.py
script. This takes care of installing any dependent code from Pypi too. This will ensure that the version of libraries you depend on are the ones you expect. Here I added nltk
as an example for a dependency. Add yours: this could be scikit-learn
, pandas
, numpy
, etc. This setup.py
also creates automatically a bin/predict
script which will be your main command line interface:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
from __future__ import absolute_import, print_function
from setuptools import setup
from setuptools import find_packages
setup(
name='predictor',
version='1.0.0',
license='public domain',
description='Predict your life with JSON.',
packages=find_packages(),
# add all your direct requirements here
install_requires=['nltk >= 3.2, < 4.0'],
# add all your command line entry points here
entry_points={'console_scripts': ['predict = prediction.predictor:main']}
)
In addition as is common for Python and to make the setup code simpler I created a "Python package" directory moving the predictor inside this directory.
Step 3. You now want to package things such that they are easy to install. A simple configure.sh
script does the job. It installs virtualenv
, pip
and setuptools
, then creates a virtualenv
in the same directory as your project and then installs your prediction tool in there (pip install .
is essentially the same as python setup.py install
). With this script you ensure that the code that will be run is the code you want to be run with the correct dependencies. Furthermore, you ensure that this is an isolated installation with minimal dependencies and impact on the target system. This is tested with Python 2 but should work quite likely on Python 3 too.
#!/bin/bash
#
# configure and installs predictor
#
ARCHIVE=15.0.3.tar.gz
mkdir -p tmp/
wget -O tmp/venv.tgz https://github.com/pypa/virtualenv/archive/$ARCHIVE
tar --strip-components=1 -xf tmp/venv.tgz -C tmp
/usr/bin/python tmp/virtualenv.py .
. bin/activate
pip install .
echo ""
echo "Predictor is now configured: run it with:"
echo " bin/predict <path to JSON file>"
At the end you have a fully configured, isolated and easy to install piece of code with a simple highly portable command line interface. You can see it all in this small repo: https://github.com/pombredanne/predictor You just clone or fetch a zip or tarball of the repo, then go through the README and you are in business.
Note that for a more engaged way for more complex applications including vendoring the dependencies for easy install and not depend on the network you can check this https://github.com/nexB/scancode-toolkit I maintain too.
And if you really want to expose a web service, you could reuse this approach and package that with a simple web server (like the one built-in in the Python standard lib or bottle or flask or gunicorn) and provide configure.sh
to install it all and generate the command line to launch it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With