how can I preprocess input data before making predictions in sagemaker?

Tags:

aws-java-sdk

I am calling a Sagemaker endpoint using java Sagemaker SDK. The data that I am sending needs little cleaning before the model can use it for prediction. How can I do that in Sagemaker.

I have a pre-processing function in the Jupyter notebook instance which is cleaning the training data before passing that data to train the model. Now I want to know if I can use that function while calling the endpoint or is that function already being used? I can show my code if anyone wants?

EDIT 1 Basically, in the pre-processing, I am doing label encoding. Here is my function for preprocessing

def preprocess_data(data):
 print("entering preprocess fn")
 # convert document id & type to labels
 le1 = preprocessing.LabelEncoder()
 le1.fit(data["documentId"])
 data["documentId"]=le1.transform(data["documentId"])
 le2 = preprocessing.LabelEncoder()
 le2.fit(data["documentType"])
 data["documentType"]=le2.transform(data["documentType"])
 print("exiting preprocess fn")
 return data,le1,le2

Here the 'data' is a pandas dataframe.

Now I want to use these le1,le2 at the time of calling endpoint. I want to do this preprocessing in sagemaker itself not in my java code.

236

asked Mar 30 '18 20:03

gashu

3 Answers

There is now a new feature in SageMaker, called inference pipelines. This lets you build a linear sequence of two to five containers that pre/post-process requests. The whole pipeline is then deployed on a single endpoint.

https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html

answered Oct 28 '22 05:10

Julien Simon

You need to write a script and supply that while creating you model. That script would have a input_fn where you can do your preprocessing. Please refer to aws docs for more details.

https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet-training-inference-code-template.html

answered Oct 28 '22 07:10

Raman

One option is to put your pre-processing code as part of an AWS Lambda function and use that Lambda to call the invoke-endpoint of SageMaker, once the pre-processing is done. AWS Lambda supports Python and it should be easy to have the same code that you have in your Jupyter notebook, also within that Lambda function. You can also use that Lambda to call external services such as DynamoDB for lookups for data enrichment.

You can find more information in the SageMaker documentations: https://docs.aws.amazon.com/sagemaker/latest/dg/getting-started-client-app.html

answered Oct 28 '22 07:10

Guy

Related questions
                            
                                How to subscribe a SQS queue to a SNS topic in Java
                            
                                How to check if bucket already exists in AWS S3
                            
                                How do I download an S3 file only if it has changed?
                            
                                AWS Java SDK Version For Creating a Lambda
                            
                                Is TransferManager reusable after upload?
                            
                                batchLoad on a Global Secondary Index in Dynamo
                            
                                Can't access S3 Pre-Signed URL due to authorization [duplicate]
                            
                                Amazon Cloud Watch Log - PutLogEventsRequest - The given sequenceToken is invalid
                            
                                NoSuchMethodError with jackson while trying to upload files to Amazon S3 using java.
                            
                                S3 download pdf - REST API
                            
                                Cognito Customise Messages and Include User Attributes
                            
                                AWS SDK Java version 2 - is there an equivalent to doesObjectExist() from ver 1?
                            
                                Could not initialize class com.amazonaws.partitions.PartitionsLoader

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With