Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Direct Upload to S3 Using Python/Boto/Django to Construct Policy

I have been through many iterations of this problem so far, searched out many different examples, and have been all through the documentation.

I am trying to combine Plupload (http://www.plupload.com/) with the AWS S3 direct post method (http://aws.amazon.com/articles/1434). However, I believe there's something wrong with the way I am constructing my policy and signature for transmission. When I submit the form, I don't get a response from the server, but rather my connection to the server is reset.

I have attempted using the python code in the example:

import base64
import hmac, sha

policy = base64.b64encode(policy_document)

signature = base64.b64encode(
hmac.new(aws_secret_key, policy, sha).digest())

I have also tried to use the more up-to-date hashlib library in python. Whatever method I use to construct my policy and signature, I always get different values than those generated here:

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

I have read through this question:

How do I make Plupload upload directly to Amazon S3?

But I found the examples provided to be overly complicated and wasn't able to accurately implement them.

My most recent attempts have been to use portions of the boto library:

http://boto.cloudhackers.com/ref/s3.html#module-boto.s3.connection

But using the S3Commection.build_post_form_args method has not worked for me either.

If anyone could provide a proper example of how to create the post form using python, I would very much appreciate it. Even some simple insights on why the connection is always reset would be nice.

Some caveats:

I would like to use hashlib if possible. I want to get an XML response from Amazon (presumably "success_action_status = '201'" does this) I need to be able to upload largish type files, max size ~2GB.

One final note, when I run this in Chrome, it provides upload progress, and the upload usually fails around 37%.

like image 963
MrOodles Avatar asked Aug 19 '11 20:08

MrOodles


2 Answers

Nathan's answer helped get me started. I've included two solutions that are currently working for me.

The first solution uses plain Python. The second uses boto.

I tried to get boto working first, but kept getting errors. So I went back to the Amazon ruby documentation and got S3 to accept files using python without boto. (Browser Uploads to S3 using HTML POST)

After understanding what was going on, I was able to fix my errors and use boto, which is a simpler solution.

I'm including solution 1 because it shows explicitly how to setup the policy document and signature using python.

My goal was to create the html upload page as a dynamic page, along with the "success" page the user sees after a successful upload. Solution 1 shows the dynamic creation of the form upload page, while solution 2 shows the creation of both the upload form page and the success page.

Solution 1:

import base64
import hmac, hashlib

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
EXPIRE_DATE = "2015-01-01T00:00:00Z" # Jan 1, 2015 gmt
FILE_TO_UPLOAD = "${filename}"
BUCKET = "media.mysite.com"
KEY = ""
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = ""
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or "https" for better security
PAGE_TITLE = "My Html Upload to S3 Form"
ACTION = "%s://%s.s3.amazonaws.com/" % (HTTP_OR_HTTPS, BUCKET)

###### DON'T EDIT FROM HERE ON DOWN ######

policy_document_data = {
"expire": EXPIRE_DATE,
"bucket_name": BUCKET,
"key_name": KEY,
"acl_name": ACL,
"success_redirect": SUCCESS,
"content_name": CONTENT_TYPE,
"content_length": CONTENT_LENGTH,
}

policy_document = """
{"expiration": "%(expire)s",
  "conditions": [ 
    {"bucket": "%(bucket_name)s"}, 
    ["starts-with", "$key", "%(key_name)s"],
    {"acl": "%(acl_name)s"},
    {"success_action_redirect": "%(success_redirect)s"},
    ["starts-with", "$Content-Type", "%(content_name)s"],
    ["content-length-range", 0, %(content_length)d]
  ]
}
""" % policy_document_data

policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_KEY, policy, hashlib.sha1).digest())

html_page_data = {
"page_title": PAGE_TITLE,
"action_name": ACTION,
"filename": FILE_TO_UPLOAD,
"access_name": AWS_ACCESS_KEY,
"acl_name": ACL,
"redirect_name": SUCCESS,
"policy_name": policy,
"sig_name": signature,
"content_name": CONTENT_TYPE,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action_name)s" method="post" enctype="multipart/form-data">
  <input type="hidden" name="key" value="%(filename)s">
  <input type="hidden" name="AWSAccessKeyId" value="%(access_name)s">
  <input type="hidden" name="acl" value="%(acl_name)s">
  <input type="hidden" name="success_action_redirect" value="%(redirect_name)s">
  <input type="hidden" name="policy" value="%(policy_name)s">
  <input type="hidden" name="signature" value="%(sig_name)s">
  <input type="hidden" name="Content-Type" value="%(content_name)s">

  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    items = [
    "",
    "",
    "policy_document: %s" % policy_document,
    "ploicy: %s" % policy,
    "signature: %s" % signature,
    "",
    "",
    ]
    for item in items:
        g.es(item)

Solution 2:

from boto.s3 import connection

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
SUCCESS_NAME = "success.html"
EXPIRES = 60*60*24*356 # seconds = 1 year
BUCKET = "media.mysite.com"
KEY = "${filename}" # will match file entered by user
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = "" # seems to work this way
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or https for better security
PAGE_TITLE = "My Html Upload to S3 Form"

###### DON'T EDIT FROM HERE ON DOWN ######

conn = connection.S3Connection(AWS_ACCESS_KEY,AWS_SECRET_KEY)
args = conn.build_post_form_args(
    BUCKET,
    KEY,
    expires_in=EXPIRES,
    acl=ACL,
    success_action_redirect=SUCCESS,
    max_content_length=CONTENT_LENGTH,
    http_method=HTTP_OR_HTTPS,
    fields=None,
    conditions=None,
    storage_class='STANDARD',
    server_side_encryption=None,
    )

form_fields = ""
line = '  <input type="hidden" name="%s" value="%s" >\n'
for item in args['fields']:
    new_line = line % (item["name"], item["value"])
    form_fields += new_line

html_page_data = {
"page_title": PAGE_TITLE,
"action": args["action"],
"input_fields": form_fields,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action)s" method="post" enctype="multipart/form-data" >
%(input_fields)s
  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

success_page = """
<html>
  <head>
    <title>S3 POST Success Page</title>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <script src="jquery.js"></script>
      <script src="purl.js"></script>
<!--

    Amazon S3 passes three data items in the url of this page if
        the upload was successful:
        bucket = bucket name
        key = file name upload to the bucket
        etag = hash of file

    The following script parses these values and puts them in
    the page to be displayed.

-->

<script type="text/javascript">
var pname,url,val,params=["bucket","key","etag"];
$(document).ready(function()
{
  url = $.url();
  for (param in params)
  {
    pname = params[param];
    val = url.param(pname);
    if(typeof val != 'undefined')
      document.getElementById(pname).value = val;
  }
});
</script>

  </head>
  <body>
      <div style="margin:0 auto;text-align:center;">
      <p>Congratulations!</p>
      <p>You have successfully uploaded the file.</p>
        <form action="#" method="get"
          >Location:
        <br />
          <input type="text" name="bucket" id="bucket" />
        <br />File Name:
        <br />
          <input type="text" name="key" id="key" />
        <br />Hash:
        <br />
          <input type="text" name="etag" id="etag" />
      </form>
    </div>
  </body>
</html>
"""

with open(SUCCESS_NAME, "wb") as f:
    f.write(success_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    g.es("conn = %s" % conn)
    for key in args.keys():
        if key is not "fields":
            g.es("%s: %s" % (key, args[key]))
            continue
        for item in args['fields']:
            g.es(item)
like image 80
Speed Ream Avatar answered Sep 30 '22 18:09

Speed Ream


I tried using Boto but found it didn't let me put in all of the headers I wanted. Below you can see what I do to generate the policy, signature, and a dictionary of post form values.

Note that all of the x-amz-meta-* tags are custom header properties and you don't need them. Also notice that pretty much everything that is going to be in the form needs to be in the policy that gets encoded and signed.

def generate_post_form(bucket_name, key, post_key, file_id, file_name, content_type):
  import hmac
  from hashlib import sha1
  from django.conf import settings
  policy = """{"expiration": "%(expires)s","conditions": [{"bucket":"%(bucket)s"},["eq","$key","%(key)s"],{"acl":"private"},{"x-amz-meta-content_type":"%(content_type)s"},{"x-amz-meta-file_name":"%(file_name)s"},{"x-amz-meta-post_key":"%(post_key)s"},{"x-amz-meta-file_id":"%(file_id)s"},{"success_action_status":"200"}]}"""
  policy = policy%{
    "expires":(datetime.utcnow()+settings.TIMEOUT).strftime("%Y-%m-%dT%H:%M:%SZ"), # This has to be formatted this way
    "bucket": bucket_name, # the name of your bucket
    "key": key, # this is the S3 key where the posted file will be stored
    "post_key": post_key, # custom properties begin here
    "file_id":file_id,
    "file_name": file_name,
    "content_type": content_type,
  }
  encoded = policy.encode('utf-8').encode('base64').replace("\n","") # Here we base64 encode a UTF-8 version of our policy.  Make sure there are no new lines, Amazon doesn't like them.
  return ("%s://%s.s3.amazonaws.com/"%(settings.HTTP_CONNECTION_TYPE, self.bucket_name),
          {"policy":encoded,
           "signature":hmac.new(settings.AWS_SECRET_KEY,encoded,sha1).digest().encode("base64").replace("\n",""), # Generate the policy signature using our Amazon Secret Key
           "key": key,
           "AWSAccessKeyId": settings.AWS_ACCESS_KEY, # Obviously the Amazon Access Key
           "acl":"private",
           "x-amz-meta-post_key":post_key,
           "x-amz-meta-file_id":file_id,
           "x-amz-meta-file_name": file_name,
           "x-amz-meta-content_type": content_type,
           "success_action_status":"200",
          })

The returned tuple can then be used to generate a form that posts to the generated S3 url with all of the key value pairs from the dictionary as hidden fields and your actual file input field, whose name/id should be "file".

Hope that helps as an example.

like image 40
White Box Dev Avatar answered Sep 30 '22 16:09

White Box Dev