Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS SageMaker data preparation

I am trying to understand how to implement a machine learning algorithm, where the preprocessing and postprocessing is an heavy taskm inside AWS Sagemaker. The main idea is to get data from S3, each time the data change in S3, Cloud watch triggers a lambda function to invoke a SageMaker endpoint. The problem is that, once the algorithm is trained, before predicting the new data, i need to preprocess the data (custom NLP preprocessing). Once the Algorithm have done the prediction, i need to take this prediction, do a post-process and then send the post-processed data to S3. The idea i have in mind is to create a docker:

├── text_classification/                - ml scripts
|   ├── app.py                            
|   ├── config.py                         
|   ├── data.py                           
|   ├── models.py                         
|   ├── predict.py                        - pre-processing data and post-processing data
|   ├── train.py                          
|   ├── utils.py                          

So i will do the pre-processing and the post-processing inside "predict.py". When i will invoke the endpoint for prediction, that script will run. Is this correct?

like image 890
lgndrzzz Avatar asked Jun 10 '26 22:06

lgndrzzz


1 Answers

Take a look at using Step Functions to orchestrate the entire workflow for you.

Have the CloudWatch event trigger a Step Function that would do the following:

  • Preprocess data
  • Create predictions (if its a batch process why not use batch transform instead).
  • Use a retry loop to check if inference has been completed.
  • Once it has been inferred run post processing of data and copy to S3.
like image 163
Chris Williams Avatar answered Jun 12 '26 12:06

Chris Williams



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!