Is there a simple way to access a data file stored on Amazon S3 directly from the command line?
I'm loosely following an online tutorial where the author links to the following URL:
s3://bml-data/churn-bigml-80.csv
It is a simple csv file, but I can't open it using my web browser, or with curl
. The tutorial opens it with BigML, but I want to download the data for myself. Some googling tells me that there are a number of python and Scala libraries designed for S3 access ... but it would be really nice to open or download the file more directly.
I use Mac and am a big fan of homebrew, so the perfect solution (for me) would work on this system.
Is there any good way to see the contents of an Amazon E3 bucket (that I don't own)?
The nature of the file (80% of a particular data-set) makes me suspect that there may be a churn-bigml-20.csv
file hiding somewhere out there. My automatic approach would be to try and curl / open the expected file ... the solution to the first question will allow me to check this hunch but in an ugly way. If anyone knows of a way to remotely explore the contents of a specific S3 bucket, then that would be very useful. Again, exploring google and SO tells me that there are libraries for this, but a more direct approach would be useful.
You can also download the object to your local computer. In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it.
In AWS Explorer, expand the Amazon S3 node, and double-click a bucket or open the context (right-click) menu for the bucket and choose Browse. In the Browse view of your bucket, choose Upload File or Upload Folder. In the File-Open dialog box, navigate to the files to upload, choose them, and then choose Open.
You can use cp to copy the files from an s3 bucket to your local system. Use the following command: $ aws s3 cp s3://bucket/folder/file.txt .
Log in to the AWS Console using either root account or IAM user and then expand Services. You can see S3 listed in the Storage group as shown below. Click on S3, and it launches the S3 console. Here, you see an existing bucket (if any) and options to create a new bucket.
The AWS Command Line Interface (CLI) is a unified tool to manage AWS services, including accessing data stored in Amazon S3.
The AWS Command Line Interface is available for Windows, Mac and Linux.
If the bucket owner has granted public permissions for ListBucket
, then you can list the contents of the bucket, eg:
aws s3 ls s3://bml-data
If the bucket owner has granted public permissions for GetObject
, then you can copy an object:
aws s3 cp s3://bml-data/churn-bigml-80.csv churn-bigml-80.csv
Both of these commands works successfully for me.
See also:
There's a neat tool called s3cmd
that will do this.
brew install s3cmd
Configuring the s3cmd requires that you have an amazon s3 account. This is free, but you need to sign up for it here.
s3cmd --configure
Configuration involves specifying your access / secret key pair, and a few other details (I used defaults for everything). If you want to use HTTPS then you can install gpg
with brew, and set a few more configuration options at this point. Be warned - the gpg_passphrase that you use is stored in a local plain-text configuration file!
Now for the exciting bit: downloading my file to desktop!
s3cmd get s3://bml-data/churn-bigml-80.csv ~/Desktop
Listing the contents of the remote bucket:
s3cmd ls s3://bml-data/
This is beyond the scope of the question but seems worth mentioning: s3cmd can do other things like put
data into the bucket (and make it public with the -P flag), delete files, and show the manual for more information:
s3cmd -P put ~/Desktop/my-file.png s3://mybucket/
s3cmd del s3://mybucket/my-file-to-delete.png
man s3cmd
Thanks to Neil Gee for his tutorial on s3cmd.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With