Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to set file permissions in S3 using boto and django

I have been trying to find a solution to this for about 36 hours now so hopefully I'm not duplicating a question or asking something obvious. I am building a web app that has to manipulate files that I store in S3 and put back the new versions in S3 with a 'public-read' acl. Then a different page allows you to view the updated file. The app exists on an amazon EC2 server and connects to an amazon S3 bucket.

I am using django, celery, and boto to do this. I have a celery task set up that gets some info from one of my views and does the processing and then posts the new file to S3. I am able to get the original file from S3, manipulate it successfully, and repost it to S3. The only thing that doesn't seem to work is changing the permissions on that file. So everything works except when you go to the viewing page, I get a 403 (Forbidden) error when trying to access that file.

If I go into S3 myself and change the permission on that file for everyone to read, it all works. Before I go on, the code that I use in my task that almost works is:

name = 'filename.blah'
conn = boto.connect_s3()
b = conn.get_bucket(settings.AWS_STORAGE_BUCKET_NAME)
grab_from_S3(name,b) # grab file from S3
out_name = conv(name)
send_to_S3(out_name,b)

where the functions in there are:

def grab_from_S3(file,bucket):
    k = Key(bucket)
    k.key = file
    k.get_contents_to_filename(file)

def send_to_S3(file,bucket):
    k = Key(bucket)
    k.key = file
    k.set_contents_from_filename(file)
    k.set_acl('public-read')

and conv(name) just does some conversion stuff. So this works almost all the way except the file's permissions are not 'public-read'. All the AWS credentials and bucket name I assume are being imported from the environment properly because it is able to push and pull files to and from S3.

The big confusing part is that when I open up a python environment from either the venv on my EC2 server or just the python that was installed on it to begin with and I run all the commands I show above, it DOES work. I can change the permission with no problems. And when the task runs it does not throw any errors in the celery logs so I don't think the task is actually running into errors. It's just simply not changing what it's supposed to change.

Things that I have tried:

  1. I tried to use other versions of the permissions function such as k.set_contents_from_filename(file,policy='public-read') or k.make_public() or b.set_acl('public-read',out_name) but none of those worked either.
  2. I changed the permissions on the bucket to say that everyone was allowed to change the permissions and it didn't work still.
  3. I tried to change the bucket policy to this below and it made no effect:

    { "Version": "2008-10-17", "Id": "whatever", "Statement": [ { "Sid": "whatever", "Effect": "Allow", "Principal": { "AWS": "*" }, "Action": [ "s3:PutObjectAcl", "s3:PutObject"], "Resource": [ "arn:aws:s3:::bucket_name", "arn:aws:s3:::bucket_name/*" ] } ] }

In the end, I'm really confused because I can seem to do all of this just fine from a python environment on the same EC2 instance but not the code running on that instance. I've searched and searched and haven't been able to find any suggestions that worked. Another possibly useful piece of info (but it might be irrelevant depending on the problem) is that if I try to connect to S3 in my view by doing similar commands above it returns an error:

"No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials"

even though it works when those commands are run in my task (I would assume it was the wrong access key or secret access key or something, but it works with everything else). I think I'm doing the correct imports in the python code of the parts of the boto library I need.

I just recently set this instance up so it has probably almost the newest version of boto, celery, django, etc on it. I probably forgot something. Please let me know if you need more info to answer the question. I'm really not sure what is going on.

Thanks a ton in advance.

like image 830
barragan Avatar asked Sep 30 '13 01:09

barragan


People also ask

Why can't I access a specific folder or Amazon S3 bucket?

Check the following permissions for any settings that are denying your access to the prefix or object: Ownership of the prefix or object. Restrictions in the bucket policy. Restrictions in your AWS Identity and Access Management (IAM) user policy.

How to manage Amazon S3 bucket access permissions in Python?

The code uses the AWS SDK for Python to manage Amazon S3 bucket access permissions using this method of the Amazon S3 client class: get_bucket_acl.

What is Boto in AWS S3?

Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access. When we create a bucket on AWS S3, by default bucket content permissions are set to be private.

How do I add a new S3 bucket in Django?

Bucket is what we call a storage container in S3. We can work with several buckets within the same Django project. But, for the most part you will only need one bucket per website. Click in the Servicesmenu and search for S3. It’s located under Storage. If you see the screen below, you are in the right place.

Does an object inherit the permissions of an existing S3 bucket?

However, if you specify an existing Amazon S3 bucket, you must ensure that the S3 bucket has the correct permissions. An object does not inherit the permissions from its bucket. For example, if you create a bucket and grant write access to a user, you can't access that user’s objects unless the user explicitly grants you access.


1 Answers

I solved the problem myself after about 4 days and the answer was right under my nose the whole time. So for the sake of anybody else that might happen across this, I will expose my silliness.

I am very very new to celery. What I did not realize is that every time you make a change to your celery tasks, your workers need to be restarted for them to see the changes. This was never a problem for me before because I always started the workers myself each time I was developing, but I have recently switched to running celery as a daemon. So this was the first change I made in which celery was running the whole time.

The answer was that I just needed to restart the daemon so that it would see my commands. It all works now. I went and tried to search for a line in the celery documentation or getting started guides about remembering to do this when you make changes or that the code is imported but did not see anything obvious. I found this through some other answers:

http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.autoreload.html

Which can be useful for development. But I haven't seen any explicit line telling new people to celery to make sure they are aware of needing to restart the workers. Perhaps this is obvious, and I am just too new. If anyone knows where there is a link to some info about it, that would be a welcome post to this as someone might want to read it in the future. Sorry for wasting everyone's time.

like image 83
barragan Avatar answered Nov 04 '22 19:11

barragan