'PythonModelContext' object returned from mlflow.pyfunc.load_model - how to retrieve original model that was saved

Question

I am creating a custom myflow.pyfunc object that I would like to save to MLFlow and retrieve later. I don't understand the relationship between the object that is saved with mlflow.pyfunc.save_model(), and the one that is retrieved with mlflow.pyfunc.load_model().

The loaded model is a 'PythonModelContext' object rather than my original python class. When I try to use the predict method in the loaded version I get an error.

Here I initialise MLflow and create a dummy example of my class

# load 
import os
import tempfile
from pathlib import Path
import pandas as pd
import mlflow
from mlflow.tracking import MlflowClient
import mlflow.pyfunc
from mlflow.pyfunc import PythonModelContext

# initialise MLFlow
mlflow_var = os.getenv('HYMIND_REPO_TRACKING_URI')
mlflow.set_tracking_uri(mlflow_var)   

client = MlflowClient()

# Define the class that will be used for fit and predict (dummy example)
class PredictSpeciality(mlflow.pyfunc.PythonModel):
    
    def fit(self):
        print('fit')
        d = {'col1': [1, 2], 'col2': [3, 4]}
        df = pd.DataFrame(data=d)
        return df
           
    def predict(self, X, y=None):
        print('predict')
        print(X.shape)
        return

If I now run the class as it is the predict method works:

# Use of this predictor before saving works fine 
m = PredictSpeciality()
df = m.fit()
m.predict(df)

But if I save the model to the registry, and then re-load it, the predict method no longer works:

counter +=1
exp_name = 'MLflow-test-' + str(counter)

os.environ["MLFLOW_EXPERIMENT_NAME"] = exp_name
experiment_id = mlflow.create_experiment(exp_name)

mlflow.set_experiment(exp_name)
experiment = dict(mlflow.get_experiment_by_name(exp_name))
experiment_id = experiment['experiment_id']

with mlflow.start_run():
    
    # dummy code here for fitting a model
    m = PredictSpeciality()
    df = m.fit()
    
# mark best run
runs = mlflow.search_runs()
best_run_id = runs['run_id'][0]

# tag the best run and save model
with mlflow.start_run(run_id=best_run_id):
    mlflow.set_tag('best_run_', 1)   

    mlflow_model_path = f'/data/hymind/repo/{experiment_id}/{best_run_id}/artifacts/model/'
    mlflow.pyfunc.save_model(path=mlflow_model_path, python_model=m)
    
# end experiment and register best model
model_name = 'MLflow-test' + str(counter)
registered_model = mlflow.register_model(f'runs:/{best_run_id}/model', model_name)

# now attempt to make a prediction using the loaded model
model_version = 1
m = mlflow.pyfunc.load_model(f"models:/{model_name}/{model_version}")
m.predict(df)

In this case, I get the attribute error

AttributeError: 'PythonModelContext' object has no attribute 'shape'

How do I get the original model back from the 'PythonModelContext' object?

Daniel Schneider · Accepted Answer

If you take a close look at the signature of the abstract method predict() in the mlflow.pyfunc.PythonModel class that you are extending, you will see that has 3 parameters:

def predict(self, context, model_input):

So, if you change your simple class to have the extra parameter context, your example should work:

class PredictSpeciality(mlflow.pyfunc.PythonModel):
    def fit(self):
        print('fit')
        d = {'col1': [1, 2], 'col2': [3, 4]}
        df = pd.DataFrame(data=d)
        return df
           
    def predict(self, context, X, y=None):
        print('predict')
        print(X.shape)
        return

To elaborate a bit more on what is going on here: There are 2 classes at play: mlflow.pyfunc.PythonModel and mlflow.pyfunc.PyFuncModel.

The mlflow.pyfunc.PythonModel is being wrapped by the mlflow.pyfunc.PyFuncModel. The former is doing the actual work and the latter is dealing with the metadata, packaging, conda environment, etc. In the documentation it is explained like so:

Python function models are loaded as an instance of mlflow.pyfunc.PyFuncModel, which is an MLflow wrapper around the model implementation and model metadata (MLmodel file).

Unfortunately, the documentation also states that you cannot create a PyFuncModel directly, but only

Wrapper around model implementation and metadata. This class is not meant to be constructed directly. Instead, instances of this class are constructed and returned from mlflow.pyfunc.load_model().

I find that quite limiting and am unsure why it was designed this way, however, there are 2 things that you can do here:

Pass in an extra parameter when directly dealing with your wrapped class:

   m.predict(None, df)

Save and load the model to get an mlflow.pyfunc.PyFuncModel:

   mlflow.pyfunc.save_model(path="temp_model", python_model=m)
   m2 = mlflow.pyfunc.load_model("temp_model")
   m2.predict(df)

I know it isn't elegant, but I actually have been using #2 in the past. It would be good if someone from the MLFlow team could comment on why direct creation of a mlflow.pyfunc.PyFuncModel is not supported.

'PythonModelContext' object returned from mlflow.pyfunc.load_model - how to retrieve original model that was saved

Tags:

python

mlflow

zmek

1 Answers

Daniel Schneider

Recent Activity

Donate For Us

'PythonModelContext' object returned from mlflow.pyfunc.load_model - how to retrieve original model that was saved

Tags:

python

mlflow

zmek

1 Answers

Daniel Schneider

Related questions

Recent Activity

Donate For Us