I've just started to experiment with AWS SageMaker and would like to load data from an S3 bucket into a pandas dataframe in my SageMaker python jupyter notebook for analysis. I could use boto to grab the data from S3, but I'm wondering whether there is a more elegant method as part of the SageMaker framework to do this in my python code? Thanks in advance for any advice.

If you have a look here it seems you can specify this in the InputDataConfig. Search for "S3DataSource" (ref) in the document. The first hit is even in Python, on page 25/26.

Load S3 Data into AWS SageMaker Notebook

3 Answers

import boto3
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

pd.read_csv(data_location)

141

answered Oct 23 '22 00:10

Chhoser

In the simplest case you don't need boto3, because you just read resources.
Then it's even simpler:

import pandas as pd

bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

pd.read_csv(data_location)

But as Prateek stated make sure to configure your SageMaker notebook instance to have access to s3. This is done at configuration step in Permissions > IAM role

answered Oct 23 '22 00:10

ivankeller

If you have a look here it seems you can specify this in the InputDataConfig. Search for "S3DataSource" (ref) in the document. The first hit is even in Python, on page 25/26.

answered Oct 23 '22 01:10

Jonatan

Related questions
                            
                                How to create/customize your own scorer function in scikit-learn?
                            
                                How do you create a custom activation function with Keras?
                            
                                Python regex findall
                            
                                Save Naive Bayes Trained Classifier in NLTK
                            
                                scikit-learn random state in splitting dataset
                            
                                Quick way to extend a set if we know elements are unique
                            
                                pyodbc insert into sql
                            
                                PyYAML dump format
                            
                                How to set the root directory for Visual Studio Code Python Extension?
                            
                                How is `x = 42; x = lambda: x` parsed?
                            
                                Simple file server to serve current directory [closed]
                            
                                How can I implement incremental training for xgboost?
                            
                                Dynamic/runtime method creation (code generation) in Python
                            
                                Make distutils look for numpy header files in the correct place
                            
                                Python: 'break' outside loop
                            
                                Converting a deque object into list
                            
                                In TensorFlow is there any way to just initialize uninitialised variables?
                            
                                How to flatten a pandas dataframe with some columns as json?
                            
                                Python modulo on floats
                            
                                Remove very last character in file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Load S3 Data into AWS SageMaker Notebook

Tags:

python

amazon-web-services

machine-learning

amazon-s3

amazon-sagemaker

A555h55

People also ask

3 Answers

Chhoser

ivankeller

Jonatan

Recent Activity

Donate For Us