Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the same preprocessing code for both training and inference in sagemaker

I am working on building a machine learning pipeline for time series data where the goal is to retrain and update the model frequently to make predictions.

  • I have written a preprocessing code that handles the time series variables and transforms them.

I am confused about how to use the same preprocessing code for both training and inference? Should I write a lambda function to preprocess my data or is there any other way

Sources looked into:

The two examples given by the aws sagemaker team use AWS Glue to do the ETL tranform.

inference_pipeline_sparkml_xgboost_abalone

inference_pipeline_sparkml_blazingtext_dbpedia

I am new to aws sagemaker trying to learn, understand and build the flow. Any help is appreciated!

like image 320
Sandy Avatar asked Nov 15 '22 21:11

Sandy


1 Answers

Answering the problems in a backwards fashion.

From your example, The below piece of code is the inference pipeline where 2 models are put together. In here we need to remove sparkml_model and get our sklearn model.

sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model])

Before placing the sklearn model, we need the SageMaker version of SKLearn model.

First create the SKLearn Estimator using SageMaker Python library.

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    train_instance_type="ml.c4.xlarge",
    sagemaker_session=sagemaker_session)

script_path - this is python code that contains all the preprocessing logic or transformation logic. 'sklearn_abalone_featurizer.py' in the link given below.

Train the SKLearn Estimator

sklearn_preprocessor.fit({'train': train_input})

Create the SageMaker model from the SKLearn Estimator that can put in inference pipeline.

sklearn_inference_model = sklearn_preprocessor.create_model()

Inference PipeLineModel creation will be modified as indicated below.

sm_model = PipelineModel(name=model_name, role=role, models=[sklearn_inference_model, xgb_model])

For more details, refer the below link.

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

like image 123
solver149 Avatar answered May 16 '23 06:05

solver149