How can I dump a pickle object with its own dependencies?
The pickle object is generally generated from a notebook.
I tried creating virtualenv
for the notebook to track dependencies, however this way I don't get only the imports of the pickle object but many more that's used in other places of the application, which is fine enough but not the best solution.
I'm trying to build a MLOps flow. Quick explanation: MLOps is a buzzword that's synonymous with DevOps for machine learning. There are different PaaS/SaaS solutions for it offered by different companies and they commonly solve following problems:
I'll skip the storage part and focus on the first two.
In my case I'm trying to set up this flow using good old TeamCity where models are pickle objects generated by sk-learn. The requirements are:
requirements.txt
. apiPort: 8080
apiName: name-tagger
model: model-repository.internal/model.pickle
requirements: model-repository.internal/model.requirements
predicterVersion: 1.0
requirements.txt
. It's an API wrapper/layer of a pickle model that loads the model in the memory and serves predictions from a rest endpoint.Then a build configuration in TeamCity parses the file and executes the following:
requirements.txt
of predicter with requirements.txt
of pickle modelAs output of the flow I have a package including a REST API that consumes a pickle model and exposes to the defined port.
For such complex build steps I use a Makefile for on-prem system, and on cloud-based MLOps using something like AWS CodeBuild with sagemaker.
An example would be as follows for packaging dependencies and executing the below build steps would require three files your main.py containing driver function of your code, Pipfile containing dependencies for your virtualenv and models:
def main():
do_something()
if __name__ == "__main__":
main()
[[source]]
url = 'https://pypi.python.org/simple'
verify_ssl = true
name = 'pypi'
[requires]
python_version = '2.7'
[common-packages]
scipy >= "0.17.0"
pandas
[model1-packages]
numpy >= "1.11.0"
[model2-packages]
numpy == "1.0.0"
.DEFAULT_GOAL := run
init:
pipenv --three install
pipenv shell
analyze:
flake8 ./src
run_tests:
pytest --cov=src test/jobs/
run:
# cleanup
find . -name '__pycache__' | xargs rm -rf
# run the job
python main.py
After you customize these 3 files for your use case the process can be executed using the following command:
make run
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With