Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the first 100 lines of a file on S3?

Tags:

amazon-s3

I have a huge (~6 GB) file on Amazon S3 and want to get the first 100 lines of it without having to download the whole thing. Is this possible?

Here's what I'm doing now:

aws cp s3://foo/bar - | head -n 100

But this takes a while to execute. I'm confused -- shouldn't head close the pipe once it's read enough lines, causing aws cp to crash with a BrokenPipeError before it has time to download the entire file?

like image 411
Eli Rose Avatar asked Aug 31 '16 20:08

Eli Rose


People also ask

How do I extract data from AWS S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it.

How do I view contents of S3?

Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that contains the object. In the Objects list, choose the name of the object for which you want an overview. The object overview opens.

What is $$ files in S3?

The "_$folder$" files are placeholders. Apache Hadoop creates these files when you use the -mkdir command to create a folder in an S3 bucket. Hadoop doesn't create the folder until you PUT the first object.


1 Answers

Using the Range HTTP header in a GET request, you can retrieve a specific range of bytes in an object stored in Amazon S3. (see http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html)

if you use aws cli you can use aws s3api get-object --range bytes=0-xxx, see http://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html

It is not exactly as a number of lines but should allow you to retrieve your file in part so avoid downloading the full object

like image 184
Frederic Henri Avatar answered Oct 02 '22 14:10

Frederic Henri