Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download files from s3 given the file path using boto3 in python

Pretty basic but I am not able to download files given s3 path.

for eg, I have this s3://name1/name2/file_name.txt

import boto3
locations = ['s3://name1/name2/file_name.txt']
s3_client = boto3.client('s3')
bucket = 'name1'
prefix = 'name2'

for file in locations:
    s3_client.download_file(bucket, 'file_name.txt', 'my_local_folder')

I am getting error as botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found

This file exists as when I download. using aws cli as s3 path: s3://name1/name2/file_name.txt .

like image 640
Atihska Avatar asked Apr 14 '18 07:04

Atihska


People also ask

How do I download data from AWS S3?

In the Amazon S3 console, choose your S3 bucket, choose the file that you want to open or download, choose Actions, and then choose Open or Download. If you are downloading an object, specify where you want to save it.

How can I download a folder from S3?

Use the s3 cp command with the --recursive parameter to download an S3 folder to your local file system. The s3 cp command takes the S3 source folder and the destination directory as inputs and downloads the folder.


2 Answers

You need to have a list of filename paths, then modify your code like shown in the documentation:

import os
import boto3
import botocore

files = ['name2/file_name.txt']

bucket = 'name1'

s3 = boto3.resource('s3')

for file in files:
   try:
       s3.Bucket(bucket).download_file(file, os.path.basename(file))
   except botocore.exceptions.ClientError as e:
       if e.response['Error']['Code'] == "404":
           print("The object does not exist.")
       else:
           raise
like image 66
Burhan Khalid Avatar answered Nov 06 '22 06:11

Burhan Khalid


You may need to do this with some type of authentication. There are several methods, but creating a session is simple and fast:

from boto3.session import Session

bucket_name = 'your_bucket_name'
folder_prefix = 'your/path/to/download/files'
credentials = 'credentials.txt'

with open(credentials, 'r', encoding='utf-8') as f:
    line = f.readline().strip()
    access_key = line.split(':')[0]
    secret_key = line.split(':')[1]

session = Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key
)

s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)

for s3_file in bucket.objects.filter(Prefix=folder_prefix):
    file_object = s3_file.key
    file_name = str(file_object.split('/')[-1])
    print('Downloading file {} ...'.format(file_object))
    bucket.download_file(file_object, '/tmp/{}'.format(file_name))

In credentials.txt file you must add a single line where you concatenate the access key id and the secret, for example:

~$ cat credentials.txt
AKIAIO5FODNN7EXAMPLE:ABCDEF+c2L7yXeGvUyrPgYsDnWRRC1AYEXAMPLE

Don't forget to protect this file well on your host, give read-only permissions for the user who runs this program. I hope it works for you, it works perfectly for me.

like image 30
JavDomGom Avatar answered Nov 06 '22 06:11

JavDomGom