I have a saved model for Sentiment Analysis and code and data along with it. I am trying to create a library that will have functionalities from this code and uses this trained model. I do not get how will I incorporate the model and functionalities dependent upon it.
Can anyone guide me on how to do that specifically?
Edit: Using pickle is the method I went with (answered below)
You need to know about three things if you want to maintain such a library properly:
There is a few ways how you could do that, the most user-friendly at the moment is probably poetry, so I'll use that as an example. It needs to be installed if you want to use this post as a tutorial.
In order to have some very basic project skeleton to work with, I'll just assume that you have something similar to this:
modelpersister
├───modelpersister
│ ├───model.pkl
│ ├───__init__.py
│ ├───model_definition.py
│ ├───train.py
│ └───analyze.py
└───pyproject.toml
model.pkl: the model artifact that you're going to ship with your package__init__.py: empty, needs to be there to make this folder a python modulemodel_definition.py: contains the class definition and features that define your modeltrain.py: accepts data to train you model and overwrite the current model.pkl file with the result, something roughly like this:import pickle
from pathlib import Path
from modelpersister.model_definition import SentimentAnalyzer
# overwrite the current model given some new data
def train(data):
model = SentimentAnalyzer.train(data)
with open(Path(__file__).parent / "model.pkl") as model_file:
pickle.dump(model, model_file)
analyze.py: accepts data points to analyze them given the current model.pkl, something roughly like this:import pickle
import importlib.resources
from modelpersister.model_definition import MyModel
# load the current model as a package resource (small but important detail)
with importlib.resources.path("modelpersister", "model.pkl") as model_file:
model: MyModel = pickle.load(model_file)
# make meaningful analyzes available in this file
def estimate(data_point):
return model.estimate(data_point)
pyproject.toml: a metadata file that poetry needs in order to package this code, something very similar to this:[tool.poetry]
name = "modelpersister"
version = "0.1.0"
description = "Ship a sentiment analysis model."
authors = ["Mishaal <[email protected]>"]
license = "MIT" # a good default as far as licenses go
[tool.poetry.dependencies]
python = "^3.8"
sklearn = "^0.23" # or whichever ML library you used for your model definition
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
Given all of these files being filled with meaningful code and hopefully using a better name than modelpersister for the project, your workflow would look roughly like this:
model_definition.py, train your model with train.py on better data, or add new functions in analysis.py until you feel like your model is now noticeably better than beforepoetry version minor to update the package versionpoetry build to build your code and model into a source distribution and wheel file that you can, if you want, perform some final tests onpoetry publish to distribute your package - by default to the global Python package index, but you can also set up a private PyPI instance and tell poetry about it, or upload it manually somewhere elseIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With