Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preprocess input data before making predictions inside Amazon SageMaker [duplicate]

I have a Keras/tensorflow model that we have trained by ourselves which does image related prediction. I have followed this trained keras model tutorial to deploy the model in Sagemaker and can invoke the endpoint for prediction.

Now on my client side code, before making the prediction by calling the Sagemaker endpoint, I need to download the image and do some preprocessing. Instead of doing this in the client side, I want to do this entire process in SageMaker. How do I do that?

It seems I need to update the entry point python code train.py as mentioned here:

sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  entry_point = 'train.py')

Other articles indicates that I need to override input_fn function to capture the preprocessing. But these articles refer to steps used if using MXNet framework. But my model is based on Keras/tensorflow framework.

So I am not sure how to override the input_fn function. Can anyone please suggest?

like image 956
nad Avatar asked Mar 31 '26 18:03

nad


1 Answers

I had the same problem and finally figured out how to do it.

Once you have your model_data ready, you can deploy it with the following lines.

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(
            model_data = 's3://path/to/model/model.tar.gz',
            role = role,
            framework_version = '1.12',
            entry_point = 'train.py',
            source_dir='my_src',
            env={'SAGEMAKER_REQUIREMENTS': 'requirements.txt'}
)

predictor = sagemaker_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m4.xlarge', 
    endpoint_name='resnet-tensorflow-classifier'
)

Your notebook should have a my_src directory which contains a file train.py and a requirements.txt file. The train.py file should have a function input_fn defined. For me, that function handled image/jpeg content:

import io
import numpy as np
from PIL import Image
from keras.applications.resnet50 import preprocess_input
from keras.preprocessing import image

JPEG_CONTENT_TYPE = 'image/jpeg'

# Deserialize the Invoke request body into an object we can perform prediction on
def input_fn(request_body, content_type=JPEG_CONTENT_TYPE):
    # process an image uploaded to the endpoint
    if content_type == JPEG_CONTENT_TYPE:
        img = Image.open(io.BytesIO(request_body)).resize((300, 300))
        img_array = np.array(img)
        expanded_img_array = np.expand_dims(img_array, axis=0)
        x = preprocess_input(expanded_img_array)
        return x


    else: 
        raise errors.UnsupportedFormatError(content_type)

Your processing code will depend on the model architecture you used. I was doing transfer learning off resnet50, so I used preprocess_input from keras.applications.resnet50.

Note that since my train.py code imports some modules, I had to supply requirements.txt defining those modules (that was the part I had trouble finding in the docs).

Hope this helps someone in the future.

my requirements.txt:

absl-py==0.7.1
astor==0.8.0
backports.weakref==1.0.post1
enum34==1.1.6
funcsigs==1.0.2
futures==3.2.0
gast==0.2.2
grpcio==1.20.1
h5py==2.9.0
Keras==2.2.4
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
Markdown==3.1.1
mock==3.0.5
numpy==1.16.3
Pillow==6.0.0
protobuf==3.7.1
PyYAML==5.1
scipy==1.2.1
six==1.12.0
tensorboard==1.13.1
tensorflow==1.13.1
tensorflow-estimator==1.13.0
termcolor==1.1.0
virtualenv==16.5.0
Werkzeug==0.15.4
like image 182
alex9311 Avatar answered Apr 02 '26 06:04

alex9311



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!