AWS SageMaker Minimum Configuration

Tags:

amazon-sagemaker

Why do I need Container for AWS SageMaker? If I want to run Scikit Learn on SageMaker's Jupyter notebook for self learning purposes, do I still need to configure Container for it?

What is the minimum configuration on SageMaker I will need if I just want to learn Scikit Learn? For example, I want to run Scikit Learn's Decision Tree algorithm with a set of training data and a set of test data. What do I need to do on SageMaker to perform the tasks? Thanks.

583

asked May 12 '18 04:05

David293836

2 Answers

You don't need much. Just an AWS Account with the correlated permissions on your role. Inside the AWS SageMaker Console you can just run an AWS Notebook Instance with one click. There is Sklearn preinstalled and you can use it out of the box. No special container needed.

As minimum you just need your AWS Account with the correlated permissions to create EC2 Instances and read / write from your S3. Thats all, just try it. :)

Use this as a starting point: Amazon SageMaker – Accelerating Machine Learning

You can also access it via the Jupyter Terminal

answered Oct 16 '22 07:10

Pablo

If you are not concerned about using Sagemaker's training and deployment features then you just need to create a new conda_python3 notebook and import sklearn.

I too was confused about how to take advantage of Sagemaker's train/deploy features with Scikit Learn. The best explanation and most up to date seems to be:

https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/sklearn/README.rst

The brief summary is:

You save your training data to an S3 bucket.
Create a standalone python script that does your training, serializes the training model to a file and saves it to an S3 bucket.
In a notebook on Sagemaker you import the Sagemaker SDK and point it to your training script and data. Sagemaker will then temporarily create an AWS instance to train the model.
Once trained that instance gets automatically destroyed.
Finally you use the Sagemaker SDK to deploy the trained model to another AWS instance. This also automatically creates an endpoint that can be called to make predictions.

answered Oct 16 '22 05:10

Guy C

Related questions
                            
                                Stress attribute -- sklearn.manifold.MDS / Python
                            
                                GridSearchCV - save result each iteration
                            
                                Quantile random forests from scikit-garden very slow at making predictions
                            
                                Why is cross_val_predict not appropriate for measuring the generalisation error?
                            
                                scikit-learn feature ranking returns identical values
                            
                                Using sparse matrices/online learning in Naive Bayes (Python, scikit)
                            
                                Comparing computer vision libraries in python [closed]
                            
                                [scikit learn]: Anomaly Detection - Alternative for OneClassSVM
                            
                                Scipy Sparse - distance matrix (Scikit or Scipy)
                            
                                Use of scikit Random Forest sample_weights
                            
                                numpy performance differences between Linux and Windows
                            
                                a value too large for dtype('float64') [closed]
                            
                                Using Numba with scikit-learn
                            
                                Scikit-learn tutorial documentation location
                            
                                Patsy: New levels in categorical fields in test data
                            
                                Calculating IDF using TfidfVectorizer from sklearn.feature_extraction.text.TfidfVectorizer
                            
                                Python scikit learn multi-class multi-label performance metrics?
                            
                                How to use `Dirichlet Process Gaussian Mixture Model` in Scikit-learn? (n_components?)
                            
                                Sklearn Kmeans parameter confusion?
                            
                                Scikit-Learn SVR Prediction Always Gives the Same Value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With