Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to upload a file from sagemaker notebook to S3

I am attempting to upload my cleaned (and split data using kfold) to s3 so that I can use sagemaker to create a model using it (since sagemaker wants an s3 file with training and test data). However, whenever I attempt to upload the csv to s3 it runs but I don't see the file in s3.

I have tried changing which folder I access in sagemaker, or trying to upload different types of files none of which work. In addition, I have tried the approaches in similar Stack Overflow posts without success.

Also note that I am able to manually upload my csv to s3, just not through sagemaker automatically.

The code below is what I currently have to upload to s3, which I have copied directly from AWS documentation for file uploading using sagemaker.

import io
import csv
import boto3

#key = "{}/{}/examples".format(prefix,data_partition_name)
#url = 's3n://{}/{}'.format(bucket, key)
name = boto3.Session().resource('s3').Bucket('nc-demo-sagemaker').name
print(name)
boto3.Session().resource('s3').Bucket('nc-demo-sagemaker').upload_file('train', '/')
print('Done writing to {}'.format('sagemaker bucket'))

I expect that when I run that code snippet, I am able to upload the training and test data to the folder I want for use in creating sagemaker models.

like image 491
A. Nigam Avatar asked Jun 28 '19 16:06

A. Nigam


People also ask

Can SageMaker connect to S3?

The lifecycle configuration accesses the S3 bucket via AWS PrivateLink. This architecture allows our internet-disabled SageMaker notebook instance to access S3 files, without traversing the public internet.

How do I upload files from SageMaker to my Galaxy S3?

Is there a way to upload the data to S3 from SageMaker? One way to solve this would be to save the CSV to the local storage on the SageMaker notebook instance, and then use the S3 API's via boto3 to upload the file as an s3 object. S3 docs for upload_file() available here.


3 Answers

I always upload files from Sagemaker notebook instance to S3 using this code. This will upload all the specified folder's contents to S3. Alternatively, you can specify a single file to upload.

import sagemaker


s3_path_to_data = sagemaker.Session().upload_data(bucket='my_awesome_bucket', 
                                                  path='local/path/data/train', 
                                                  key_prefix='my_crazy_project_name/data/train')

I hope this helps!

like image 117
Ilai Waimann Avatar answered Nov 15 '22 04:11

Ilai Waimann


The issue may be due to a lack of proper S3 permissions for your SageMaker notebook.

Your IAM user has a role with permissions, which is what dictates whether you can manually upload the CSV via the S3 console.

SageMaker notebooks actually have their own IAM role, which will require you to explicitly add S3 permissions. You can see this in the SageMaker console, the default IAM role is prefaced with SageMaker-XXX. You can either edit this SageMaker created IAM role, or attach existing IAM roles that include read/write permissions for S3.

like image 30
Nick Walsh Avatar answered Nov 15 '22 04:11

Nick Walsh


Import sagemaker library and use sagemaker session to upload and download files to/from s3 bucket.

import sagemaker

sagemaker_session = sagemaker.Session(default_bucket='MyBucket')
upload_data = sagemaker_session.upload_data(path='local_file_path', key_prefix='my_prefix')

print('upload_data : {}'.format(upload_data))
like image 35
Biranchi Avatar answered Nov 15 '22 04:11

Biranchi