Background
I am using the following Boto3 code to download file from S3.
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
print (key)
if key.find('/') < 0 :
if len(key) > 4 and key[-5:].lower() == '.json': //File is uploaded outside any folder
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
else:
download_path = '/tmp/{}/{}'.format(uuid.uuid4(), key)//File is uploaded inside a folder
If a new file is uploaded in S3 bucket, this code is triggered and that newly uploaded file is downloaded by this code.
This code works fine when uploaded outside any folder.
However, when I upload a file inside a directory, IO error happens. Here is a dump of the IO error I am encountering.
[Errno 2] No such file or directory: /tmp/316bbe85-fa21-463b-b965-9c12b0327f5d/test1/customer1.json.586ea9b8: IOError
test1
is the directory inside my S3 bucket where customer1.json
is uploaded.
Query
Any thoughts on how to resolve this error?
Import Libraries for Spark & Boto3 You can think of PySpark as a Python-based wrapper on top of the Scala API. Here, AWS SDK for Python (Boto3) to create, configure and manage AWS services, such as Amazon EC2 and Amazon S3. The SDK provides an object-oriented API as well as low-level access to AWS services.
Error raised because you attempted to download and save file into directory which not exists. Use os.mkdir prior downloading file to create an directory.
# ...
else:
item_uuid = str(uuid.uuid4())
os.mkdir('/tmp/{}'.format(item_uuid))
download_path = '/tmp/{}/{}'.format(item_uuid, key) # File is uploaded inside a folder
Note: It's better to use os.path.join() while operating with systems paths. So code above could be rewritten to:
# ...
else:
item_uuid = str(uuid.uuid4())
os.mkdir(os.path.join(['tmp', item_uuid]))
download_path = os.path.join(['tmp', item_uuid, key]))
Also error may be raises because you including '/tmp/' in download path for s3 bucket file, do not include tmp
folder as likely it's not exists on s3. Ensure you are on the right way by using that articles:
Amazon S3 upload and download using Python/Django
Python s3 examples
I faced the same issue, and the error message caused a lot of confusion, (the random string extension after the file name). In my case it was caused by the missing directory path, which didn't exist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With