Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use requirements.txt or similar for a pickle object

Problem

How can I dump a pickle object with its own dependencies?

The pickle object is generally generated from a notebook.

I tried creating virtualenv for the notebook to track dependencies, however this way I don't get only the imports of the pickle object but many more that's used in other places of the application, which is fine enough but not the best solution.

Background

What I'm trying to achieve

I'm trying to build a MLOps flow. Quick explanation: MLOps is a buzzword that's synonymous with DevOps for machine learning. There are different PaaS/SaaS solutions for it offered by different companies and they commonly solve following problems:

  • Automation of creating web API's from models
  • Handling requirements/dependencies
  • Storing & running scripts used for model generation, model binary and data sets.

I'll skip the storage part and focus on the first two.

How I'm trying to achieve

In my case I'm trying to set up this flow using good old TeamCity where models are pickle objects generated by sk-learn. The requirements are:

  • The dependencies must be explicitly defined
  • Other pickle objects (rather than sk-learn) must be supported.
  • The workflow for a data scientists will look like:
    • Data scientist uploads the pickle model with requirements.txt.
    • Data scientist commits a definition file which look like this:
     apiPort: 8080
     apiName: name-tagger
     model: model-repository.internal/model.pickle
     requirements: model-repository.internal/model.requirements
     predicterVersion: 1.0
    
    • where predicter is a FLASK application with own requirements.txt. It's an API wrapper/layer of a pickle model that loads the model in the memory and serves predictions from a rest endpoint.

Then a build configuration in TeamCity parses the file and executes the following:

  1. Parse the definition file.
  2. Find the predicter code
  3. Copy the pickle model as model.pickle in predicter applications root folder
  4. Merge requirements.txt of predicter with requirements.txt of pickle model
  5. Create virtualenv, install dependencies, push it as wheel

As output of the flow I have a package including a REST API that consumes a pickle model and exposes to the defined port.

like image 751
U. Bulle Avatar asked Oct 15 '22 15:10

U. Bulle


1 Answers

For such complex build steps I use a Makefile for on-prem system, and on cloud-based MLOps using something like AWS CodeBuild with sagemaker.

An example would be as follows for packaging dependencies and executing the below build steps would require three files your main.py containing driver function of your code, Pipfile containing dependencies for your virtualenv and models:

  1. main.py
def main():
      do_something()

if __name__ == "__main__":
      main()

  1. Pipfile
[[source]]
url = 'https://pypi.python.org/simple'
verify_ssl = true
name = 'pypi'

[requires]
python_version = '2.7'

[common-packages]
scipy >= "0.17.0"
pandas 

[model1-packages]
numpy >= "1.11.0"

[model2-packages]
numpy == "1.0.0"
  1. Makefile
.DEFAULT_GOAL := run

init:
    pipenv --three install
    pipenv shell

analyze:
    flake8 ./src

run_tests:
    pytest --cov=src test/jobs/

run:

    # cleanup
    find . -name '__pycache__' | xargs rm -rf

    # run the job
    python main.py 

After you customize these 3 files for your use case the process can be executed using the following command:

make run
like image 185
Ankush Chauhan Avatar answered Oct 20 '22 16:10

Ankush Chauhan