How to download data from Amazon's requester pay buckets?

Tags:

amazon-web-services

amazon-s3

I have been struggling for about a week to download arXiv articles as mentioned here: http://arxiv.org/help/bulk_data_s3#src.

I have tried lots of things: s3Browser, s3cmd. I am able to login to my buckets but I am unable to download data from arXiv bucket.

I tried:

s3cmd get s3://arxiv/pdf/arXiv_pdf_1001_001.tar

See:

$ s3cmd get s3://arxiv/pdf/arXiv_pdf_1001_001.tar


s3://arxiv/pdf/arXiv_pdf_1001_001.tar -> ./arXiv_pdf_1001_001.tar  [1 of 1]
s3://arxiv/pdf/arXiv_pdf_1001_001.tar -> ./arXiv_pdf_1001_001.tar  [1 of 1]
ERROR: S3 error: Unknown error

s3cmd get with x-amz-request-payer:requester

It gave me same error again:

$ s3cmd get --add-header="x-amz-request-payer:requester" s3://arxiv/pdf/arXiv_pdf_manifest.xml
s3://arxiv/pdf/arXiv_pdf_manifest.xml -> ./arXiv_pdf_manifest.xml  [1 of 1]
s3://arxiv/pdf/arXiv_pdf_manifest.xml -> ./arXiv_pdf_manifest.xml  [1 of 1]
ERROR: S3 error: Unknown error

Copying

I have tried copying files from that folder too.

$ aws s3 cp s3://arxiv/pdf/arXiv_pdf_1001_001.tar .

A client error (403) occurred when calling the HeadObject operation: Forbidden
Completed 1 part(s) with ... file(s) remaining

This probably means that I made a mistake. The problem is I don't know how and what to add that will convey my permission to pay for download.

I am unable to figure out what should I do for downloading data from S3. I have been reading a lot on AWS sites, but nowhere I can get pinpoint solution to my problem.

How can I bulk download the arXiv data?

369

asked Feb 28 '15 17:02

pg2455

1 Answers

For me the problem was that my IAM user didn't have enough permissions. Setting AmazonS3FullAccess was the solution for me.

Hope it'll save time to someone

answered Sep 23 '22 20:09

Alan Wagner

Related questions
                            
                                Amazon SQS Legacy Profile Format Warning
                            
                                R and data.table on AWS
                            
                                What is the recommended way to remove data type descriptors from a DynamoDB response?
                            
                                How to install PostgeSQL 11 on AWS Amazon Linux AMI 2?
                            
                                Can you have an async handler in Lambda Python 3.6?
                            
                                AWS DynamoDB - combining multiple query filters on a single non-key attribute in java
                            
                                AWS S3: Trigger multiple targets via S3 Notification upon file receipt
                            
                                How to resolve error in AWS Route 53 - import zone file error : Multiple Distinct TTL values?
                            
                                Lambda creating ENI everytime it is invoked: Hitting limit
                            
                                How to return a list when unit testing DynamoDB PaginatedQueryList
                            
                                How to accommodate Amazon FIFO SQS in Laravel queue?
                            
                                AWS Error : None of these Availability Zones contains a healthy target. Requests are being routed to all targets. in AWS
                            
                                Is it possible to stop nodes in AWS ElastiCache cluster
                            
                                .net Core - HTTPS with AWS Load Balancer and Elastic Beanstalk doesn't work
                            
                                How can I call Amazon's AWS kms decrypt function without using a binary file?
                            
                                List of external schemas and tables from Amazon Redshift
                            
                                AWS EC2 Placement Groups: Partition vs Spread
                            
                                AWS CDK VS SDK for IaC
                            
                                AWS CDK: how do I reference cross-stack resources in same app?
                            
                                Amazon EC2 AutoScaling CPUUtilization Alarm- INSUFFICIENT DATA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With