Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IOError in Boto3 download_file

Background

I am using the following Boto3 code to download file from S3.

for record in event['Records']:
    bucket = record['s3']['bucket']['name']
    key = record['s3']['object']['key']
    print (key)
    if key.find('/') < 0 :
    if len(key) > 4 and key[-5:].lower() == '.json': //File is uploaded outside any folder

        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
    else:
        download_path = '/tmp/{}/{}'.format(uuid.uuid4(), key)//File is uploaded inside a folder

If a new file is uploaded in S3 bucket, this code is triggered and that newly uploaded file is downloaded by this code.

This code works fine when uploaded outside any folder.

However, when I upload a file inside a directory, IO error happens. Here is a dump of the IO error I am encountering.

[Errno 2] No such file or directory: /tmp/316bbe85-fa21-463b-b965-9c12b0327f5d/test1/customer1.json.586ea9b8: IOError

test1 is the directory inside my S3 bucket where customer1.json is uploaded.

Query

Any thoughts on how to resolve this error?

like image 354
Rohan Avatar asked Sep 19 '16 09:09

Rohan


People also ask

Can we use Boto3 in PySpark?

Import Libraries for Spark & Boto3 You can think of PySpark as a Python-based wrapper on top of the Scala API. Here, AWS SDK for Python (Boto3) to create, configure and manage AWS services, such as Amazon EC2 and Amazon S3. The SDK provides an object-oriented API as well as low-level access to AWS services.


2 Answers

Error raised because you attempted to download and save file into directory which not exists. Use os.mkdir prior downloading file to create an directory.

# ...
else:
    item_uuid = str(uuid.uuid4())
    os.mkdir('/tmp/{}'.format(item_uuid))
    download_path = '/tmp/{}/{}'.format(item_uuid, key)  # File is uploaded inside a folder

Note: It's better to use os.path.join() while operating with systems paths. So code above could be rewritten to:

# ...
else:
    item_uuid = str(uuid.uuid4())
    os.mkdir(os.path.join(['tmp', item_uuid]))
    download_path = os.path.join(['tmp', item_uuid, key]))

Also error may be raises because you including '/tmp/' in download path for s3 bucket file, do not include tmp folder as likely it's not exists on s3. Ensure you are on the right way by using that articles:

  • Amazon S3 upload and download using Python/Django

  • Python s3 examples

like image 61
Andriy Ivaneyko Avatar answered Oct 13 '22 18:10

Andriy Ivaneyko


I faced the same issue, and the error message caused a lot of confusion, (the random string extension after the file name). In my case it was caused by the missing directory path, which didn't exist.

like image 31
Yankee Avatar answered Oct 13 '22 20:10

Yankee