Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 Python - Multipart upload to s3 with presigned part urls

Tags:

I'm unsuccessfully trying to do a multipart upload with pre-signed part URLs.

This is the procedure I follow (1-3 is on the server-side, 4 is on the client-side):

  1. Instantiate boto client.
import boto3
from botocore.client import Config

s3 = boto3.client(
    "s3",
    region_name=aws.default_region,
    aws_access_key_id=aws.access_key_id,
    aws_secret_access_key=aws.secret_access_key,
    config=Config(signature_version="s3v4")
)
  1. Initiate multipart upload.
upload = s3.create_multipart_upload(
    Bucket=AWS_S3_BUCKET,
    Key=key,
    Expires=datetime.now() + timedelta(days=2),
)
upload_id = upload["UploadId"]
  1. Create a pre-signed URL for the part upload.

part = generate_part_object_from_client_submited_data(...)

part.presigned_url = s3.generate_presigned_url(
    ClientMethod="upload_part",
    Params={
        "Bucket": AWS_S3_BUCKET,
        "Key": upload_key,
        "UploadId": upload_id,
        "PartNumber": part.no,
        "ContentLength": part.size,
        "ContentMD5": part.md5,
    },
    ExpiresIn=3600,  # 1h
    HttpMethod="PUT",
)

Return the pre-signed URL to the client.

  1. On the client try to upload the part using requests.
part = receive_part_object_from_server(...)

with io.open(filename, "rb") as f:
    f.seek(part.offset)
    buffer = io.BytesIO(f.read(part.size))

r = requests.put(
    part.presigned_url,
    data=buffer,
    headers={
        "Content-Length": str(part.size),
        "Content-MD5": part.md5,
        "Host": "AWS_S3_BUCKET.s3.amazonaws.com",
    },
)

And when I try to upload I either get:

urllib3.exceptions.ProtocolError:
('Connection aborted.', BrokenPipeError(32, 'Broken pipe'))

Or:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchUpload</Code>
  <Message>
    The specified upload does not exist. The upload ID may be invalid,
    or the upload may have been aborted or completed.
  </Message>
  <UploadId>CORRECT_UPLOAD_ID</UploadI>
  <RequestId>...</RequestId>
  <HostId>...</HostId>
</Error>

Even though the upload still exist and I can list it.

Can anyone tell me what am I doing wrong?

like image 324
Viktor Kerkez Avatar asked Sep 13 '19 19:09

Viktor Kerkez


People also ask

What is Presigned URL S3 upload?

A presigned URL is a URL that you can provide to your users to grant temporary access to a specific S3 object. Using the URL, a user can either READ the object or WRITE an Object (or update an existing object). The URL contains specific parameters which are set by your application.

How does Presigned URL work S3?

A user who does not have AWS credentials or permission to access an S3 object can be granted temporary access by using a presigned URL. A presigned URL is generated by an AWS user who has access to the object. The generated URL is then given to the unauthorized user.

When should I use Presigned URL S3?

Pre-signed URLs are used to provide short-term access to a private object in your S3 bucket. They work by appending an AWS Access Key, expiration time, and Sigv4 signature as query parameters to the S3 object. There are two common use cases when you may want to use them: Simple, occasional sharing of private files.

Can Presigned URL be shared?

Anyone who receives the presigned URL can then access the object. For example, if you have a video in your bucket and both the bucket and the object are private, you can share the video with others by generating a presigned URL.


Video Answer


2 Answers

Here is a command utilty that does exactly the same thing, you might want to give it at try and see if it works. If it does it will be easy to find the difference between your code and theirs. If it doesn't I would double check the whole process. Here is an example how to upload a file using aws commandline https://aws.amazon.com/premiumsupport/knowledge-center/s3-multipart-upload-cli/?nc1=h_ls

Actually if it does work. Ie you can replecate the upload using aws s3 commands then we need to focus on the use of persigned url. You can check how the url should look like here:

https://github.com/aws/aws-sdk-js/issues/468 https://github.com/aws/aws-sdk-js/issues/1603

This are js sdk but the guys there talk about the raw urls and parameters so you should be able to spot the difference between your urls and the urls that are working.

Another option is to give a try this script, it uses js to upload file using persigned urls from web browser.

https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload

If it works you can inspect the communication and observe the exact URLs that are being used to upload each part, which you can compare with the urls your system is generating.

Btw. once you have a working url for multipart upload you can use the aws s3 presign url to obtain the persigned url, this should let you finish the upload using just curl to have full control over the upload process.

like image 98
Piotr Czapla Avatar answered Nov 03 '22 19:11

Piotr Czapla


Did you try pre-signed POST instead? Here is the AWS Python reference for it: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-presigned-post.html

This will potentially workaround proxy limitations from client perspective, if any:

pre signed POST example

As a last resort, you can always try good old REST API, although I don't think the issue is in your code and neither in boto3: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html

like image 26
Fabio Manzano Avatar answered Nov 03 '22 20:11

Fabio Manzano