How to use requirements.txt or similar for a pickle object

Question

Problem

How can I dump a pickle object with its own dependencies?

The pickle object is generally generated from a notebook.

I tried creating virtualenv for the notebook to track dependencies, however this way I don't get only the imports of the pickle object but many more that's used in other places of the application, which is fine enough but not the best solution.

Background

What I'm trying to achieve

I'm trying to build a MLOps flow. Quick explanation: MLOps is a buzzword that's synonymous with DevOps for machine learning. There are different PaaS/SaaS solutions for it offered by different companies and they commonly solve following problems:

Automation of creating web API's from models
Handling requirements/dependencies
Storing & running scripts used for model generation, model binary and data sets.

I'll skip the storage part and focus on the first two.

How I'm trying to achieve

In my case I'm trying to set up this flow using good old TeamCity where models are pickle objects generated by sk-learn. The requirements are:

The dependencies must be explicitly defined
Other pickle objects (rather than sk-learn) must be supported.
The workflow for a data scientists will look like:
- Data scientist uploads the pickle model with requirements.txt.
- Data scientist commits a definition file which look like this:
```
 apiPort: 8080
 apiName: name-tagger
 model: model-repository.internal/model.pickle
 requirements: model-repository.internal/model.requirements
 predicterVersion: 1.0
```
- where predicter is a FLASK application with own requirements.txt. It's an API wrapper/layer of a pickle model that loads the model in the memory and serves predictions from a rest endpoint.

Then a build configuration in TeamCity parses the file and executes the following:

Parse the definition file.
Find the predicter code
Copy the pickle model as model.pickle in predicter applications root folder
Merge requirements.txt of predicter with requirements.txt of pickle model
Create virtualenv, install dependencies, push it as wheel

As output of the flow I have a package including a REST API that consumes a pickle model and exposes to the defined port.

Ankush Chauhan · Accepted Answer

For such complex build steps I use a Makefile for on-prem system, and on cloud-based MLOps using something like AWS CodeBuild with sagemaker.

An example would be as follows for packaging dependencies and executing the below build steps would require three files your main.py containing driver function of your code, Pipfile containing dependencies for your virtualenv and models:

main.py

def main():
      do_something()

if __name__ == "__main__":
      main()

Pipfile

[[source]]
url = 'https://pypi.python.org/simple'
verify_ssl = true
name = 'pypi'

[requires]
python_version = '2.7'

[common-packages]
scipy >= "0.17.0"
pandas 

[model1-packages]
numpy >= "1.11.0"

[model2-packages]
numpy == "1.0.0"

Makefile

.DEFAULT_GOAL := run

init:
    pipenv --three install
    pipenv shell

analyze:
    flake8 ./src

run_tests:
    pytest --cov=src test/jobs/

run:

    # cleanup
    find . -name '__pycache__' | xargs rm -rf

    # run the job
    python main.py

After you customize these 3 files for your use case the process can be executed using the following command:

make run

How to use requirements.txt or similar for a pickle object

Tags:

python

dependencies

pickle

data-science

devops

Problem

Background

What I'm trying to achieve

How I'm trying to achieve

U. Bulle

1 Answers

Ankush Chauhan

Recent Activity

Donate For Us

How to use requirements.txt or similar for a pickle object

Tags:

python

dependencies

pickle

data-science

devops

Problem

Background

What I'm trying to achieve

How I'm trying to achieve

U. Bulle

1 Answers

Ankush Chauhan

Related questions

Recent Activity

Donate For Us