We store files in Amazon AWS S3, and want to keep references to those files in a Document table in Postgres. I am looking for best practices. We use Python/Django, and currently store the URL that comes back from <code>boto3.s3.key.Key().generate_url(...)</code>. But so many issues with that: <ul> <li>Must parse the bucket and key out of the URL.</li> <li>Need to urldecode the key name.</li> <li>Doesn't support object versioning.</li> <li>Unicode support is easy to mess up, esp due to the urlencode/decode steps.</li> </ul> So, I'm considering storing the Bucket, Key, and Version in three separate fields, and creating the Key as a combination of the DB primary key plus a safely-encoded filename, but didn't know if there were better approaches?

Not sure if best-est approach, but we store: <ul> <li>unique object ID (might be UUID) in database (for which Postgres has a native <code>UUID</code> type)</li> <li>bucket name and path in configuration (as we store all the objects of the same type under the same bucket+path)</li> </ul> That way you can at least: <ul> <li>Move objects to a different bucket / path without havig to rewrite your whole database table</li> <li>Switch from S3 to local storage if you choose so</li> <li>Throw away your primary key (e.g. while partitioning tables) without loosing track of your objects</li> </ul>

Best practices for storing references to AWS S3 objects in a database?

Tags:

postgresql

django

amazon-s3

We store files in Amazon AWS S3, and want to keep references to those files in a Document table in Postgres. I am looking for best practices. We use Python/Django, and currently store the URL that comes back from boto3.s3.key.Key().generate_url(...). But so many issues with that:

Must parse the bucket and key out of the URL.
Need to urldecode the key name.
Doesn't support object versioning.
Unicode support is easy to mess up, esp due to the urlencode/decode steps.

So, I'm considering storing the Bucket, Key, and Version in three separate fields, and creating the Key as a combination of the DB primary key plus a safely-encoded filename, but didn't know if there were better approaches?

883

asked Nov 13 '17 16:11

Scott Stafford

1 Answers

Not sure if best-est approach, but we store:

unique object ID (might be UUID) in database (for which Postgres has a native UUID type)
bucket name and path in configuration (as we store all the objects of the same type under the same bucket+path)

That way you can at least:

Move objects to a different bucket / path without havig to rewrite your whole database table
Switch from S3 to local storage if you choose so
Throw away your primary key (e.g. while partitioning tables) without loosing track of your objects

136

answered Sep 20 '22 13:09

Linas Valiukas

Related questions
                            
                                How to assign HTML class in django template depending object's field value
                            
                                ImportError: No module named base in html5lib
                            
                                SyntaxError: keyword argument repeated
                            
                                django many-to-many show human-readable in form
                            
                                Deploying Django Channels to Elastic Beanstalk Python3.4 environment
                            
                                Django migration 11 million rows, need to break it down
                            
                                Django ManyToManyField exclude
                            
                                Get parent page on creating new Wagtail Page
                            
                                Django: File "manage.py", line 10, in <module> execute_from_command_line(sys.argv)
                            
                                How to join wagtail and django sitemaps?
                            
                                Django admin interface: using horizontal_filter with ManyToMany field with intermediate table
                            
                                Django won't set HttpOnly for csrftoken cookie
                            
                                Fetch data from form and display in template
                            
                                Django - (1366, "Incorrect string value:... error
                            
                                Django - pdf response has wrong encoding - xhtml2pdf
                            
                                Sorting items by drag and drop in django
                            
                                How to run multiple Django App Gunicorn systemd?
                            
                                Debugging AllAuth: social account not logging user in despite connecting successfully
                            
                                Django Autocomplete Light List Object has no Attribute Queryset
                            
                                Django views does not exist or could not import

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With