I have a huge (~6 GB) file on Amazon S3 and want to get the first 100 lines of it without having to download the whole thing. Is this possible? Here's what I'm doing now: <pre class="prettyprint"><code>aws cp s3://foo/bar - | head -n 100 </code></pre> But this takes a while to execute. I'm confused -- shouldn't <code>head</code> close the pipe once it's read enough lines, causing <code>aws cp</code> to crash with a BrokenPipeError before it has time to download the entire file?

Using the Range HTTP header in a GET request, you can retrieve a specific range of bytes in an object stored in Amazon S3. (see http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html) if you use aws cli you can use <code>aws s3api get-object --range bytes=0-xxx</code>, see http://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html It is not exactly as a number of lines but should allow you to retrieve your file in part so avoid downloading the full object

How to get the first 100 lines of a file on S3?

Tags:

amazon-s3

I have a huge (~6 GB) file on Amazon S3 and want to get the first 100 lines of it without having to download the whole thing. Is this possible?

Here's what I'm doing now:

aws cp s3://foo/bar - | head -n 100

But this takes a while to execute. I'm confused -- shouldn't head close the pipe once it's read enough lines, causing aws cp to crash with a BrokenPipeError before it has time to download the entire file?

411

asked Aug 31 '16 20:08

Eli Rose

1 Answers

Using the Range HTTP header in a GET request, you can retrieve a specific range of bytes in an object stored in Amazon S3. (see http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html)

if you use aws cli you can use aws s3api get-object --range bytes=0-xxx, see http://docs.aws.amazon.com/cli/latest/reference/s3api/get-object.html

It is not exactly as a number of lines but should allow you to retrieve your file in part so avoid downloading the full object

184

answered Oct 02 '22 14:10

Frederic Henri

Related questions
                            
                                Extract and save attachment from email (via SES) into AWS S3
                            
                                spark read partitioned data in S3 partly in glacier
                            
                                Since QuickSight can directly query S3, when would we need to use Athena as data source for QuickSight? [closed]
                            
                                Uploading a Dataframe to AWS S3 Bucket from SageMaker
                            
                                Amazon s3: "Block public access" settings to allow for public read private write with signed url
                            
                                Serverless: Deplyment error S3 Bucket already exists in stack
                            
                                How do I upload a file to a pre-signed URL in AWS using Node.js and Axios?
                            
                                Granular 'public' settings on uploaded files with Fog and Carrierwave
                            
                                nginx proxy and 404 redirect
                            
                                How to establish a fast and reliable S3 to EC2 connection [closed]
                            
                                How to upload to S3 with Pause/Resume support?
                            
                                Can I use transactional file remove/upload on AWS S3 with aws-sdk-ruby?
                            
                                Does Amazon S3 have a limit to MaxKeys when calling ListObjects?
                            
                                AWS S3 Glacier - Programmatically Initiate Restore
                            
                                Fine Uploader S3: Refused to get unsafe header "ETag"
                            
                                Amazon AWS S3 IAM Policy based on namespace or tag
                            
                                AmazonClientException: Data read has a different length than the expected
                            
                                Deploying ember-cli app to S3 without breaking URLs
                            
                                CloudFront: How to clear cache
                            
                                AWS S3 gracefully handle 403 after getSignedUrl expired

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With