Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separating development and production parts of django project

I'm building a site that relies on the output of a machine learning algorithm. All that is needed for the user-facing part of the site is the output of the algorithm (class labels for a set of items), which can be easily stored and retrieved from the django models. The algorithm could be run once a day, and does not rely on user input.

So this part of the site only depends on django and related packages.

But developing, tuning, and evaluating the algorithm uses many other python packages such as scikit-learn, pandas, numpy, matplotlib, etc. It also requires saving many different sets of class labels.

These dependencies cause some issues when deploying to heroku, because numpy requires LAPACK/BLAS. It also seems like it would be good practice to have as few dependencies as possible in the deployed app.

How can I separate the machine-learning part from the user-facing part, but, still have them integrated enough that the results of the algorithm are easily used?

I thought of creating two separate projects, and then writing to the user-facing database in some way, but that seems like it would lead to maintance problems (managing the dependencies, changes in database schemas etc).

As far as I understand, this problem is a little bit different than using different settings or databases for production and development, because it is more about managing different sets of dependencies.

like image 416
ajerneck Avatar asked Jul 31 '15 19:07

ajerneck


2 Answers

Just move what we discussed to the answer in case people have the same question, my suggestion is:

  1. Spend some time define what are the dependencies for your site and for the algorithm code.

  2. Dump the dependency list into requirements.txt for each project.

  3. Deploy them on different environments so the conflicts don't happen.

  4. Develop some API endpoints on your site side using Django Rest Framework or Tastypie and let your algorithm code update your model using the API. Use cron to run your algorithm code regularly and push the data.

like image 200
Shang Wang Avatar answered Sep 27 '22 02:09

Shang Wang


Create a requirements file for each environment, and a base requirements file for those packages shared by all the environments.

 $ mkdir requirements
 $ pip freeze > requirements/base.txt
 $ echo "-r base.txt" > requirements/development.txt
 $ echo "-r base.txt" > requirements/production.txt

Then adjust your development and production dependencies and install each one in the proper environment

#change to your development virtualenv
#$source .virtualenvs/development/bin/activate
$ pip install -r requirements/development.txt

#change to your production virtualenv
#$source .virtualenvs/production/bin/activate
$ pip install -r requirements/production.txt
like image 39
marcanuy Avatar answered Sep 26 '22 02:09

marcanuy