I am trying to follow the tutorial here to implement a custom inference pipeline for feature preprocessing. It uses the python sklearn sdk to bring in custom preprocessing pipeline from a script. For example: <pre class="prettyprint"><code>from sagemaker.sklearn.estimator import SKLearn script_path = 'preprocessing.py' sklearn_preprocessor = SKLearn( entry_point=script_path, role=role, train_instance_type="ml.c4.xlarge", sagemaker_session=sagemaker_session) </code></pre> However I can't find a way to send multiple files. The reason I need multiple files is because I have a custom class used in the sklearn pipeline needs to be imported from a custom module. Without importing, it raises error <code>AttributeError: module '__main__' has no attribute 'CustomClassName'</code> when having the custom class in the same preprocessing.py file due to the way pickle works (at least I think it's related to pickle). Anyone know if sending multiple files is even possible? Newbie to Sagemaker, thanks!!

There's a source_dir parameter which will "lift" a directory of files to the container and put it on your import path. You're entrypoint script should be put there to and referenced from that location.

AWS Sagemaker SKlearn entry point allow multiple script

Tags:

python

machine-learning

amazon-sagemaker

I am trying to follow the tutorial here to implement a custom inference pipeline for feature preprocessing. It uses the python sklearn sdk to bring in custom preprocessing pipeline from a script. For example:

from sagemaker.sklearn.estimator import SKLearn

script_path = 'preprocessing.py'

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    train_instance_type="ml.c4.xlarge",
    sagemaker_session=sagemaker_session)

However I can't find a way to send multiple files. The reason I need multiple files is because I have a custom class used in the sklearn pipeline needs to be imported from a custom module. Without importing, it raises error AttributeError: module '__main__' has no attribute 'CustomClassName' when having the custom class in the same preprocessing.py file due to the way pickle works (at least I think it's related to pickle).

Anyone know if sending multiple files is even possible?

Newbie to Sagemaker, thanks!!

650

asked Jan 22 '19 19:01

Wayne Yu

1 Answers

There's a source_dir parameter which will "lift" a directory of files to the container and put it on your import path.

You're entrypoint script should be put there to and referenced from that location.

191

answered Sep 29 '22 04:09

sniggatooth

Related questions
                            
                                How to rotate image before save in Django?
                            
                                Python Timedelta64 convert days to months
                            
                                Keras Model with Maxpooling1D and channel_first
                            
                                Tensorflow 1.10 TFRecordDataset - recovering TFRecords
                            
                                gdb.execute blocks all the threads in python scripts
                            
                                Does importing a Python file also import the imported files into shell?
                            
                                How to get all characters of an arbitrary encoding?
                            
                                Python's _winapi module
                            
                                Why do I fail to predict y=x**4 with Keras? (y=x**3 works)
                            
                                BeautifulSoup Prettify custom new line option
                            
                                Map index of numpy matrix
                            
                                Pandas DataFrame: difference between rolling and expanding function
                            
                                Cannot take the length of Shape with unknown rank
                            
                                How to efficiently partial argsort Pandas dataframe across columns
                            
                                Python: monkey patch a function's source code
                            
                                pytest output results are garbled within pycharm
                            
                                pandas stack and unstack performance reduces after dataframe compression and is much worse than R's data.table
                            
                                Pandas: Join if value of df1 column is in list of df2 column
                            
                                Proper way to type hint a private property in python
                            
                                Django 2 namespace and app_name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With