I'm using django staticfiles + django-storages and Amazon S3 to host my data. All is working well except that every time I run manage.py collectstatic
the command uploads all files to the server.
It looks like the management command compares timestamps from Storage.modified_time()
which isn't implemented in the S3 storage from django-storages.
How do you guys determine if an S3 file has been modified?
I could store file paths and last modified data in my database. Or is there an easy way to pull the last modified data from Amazon?
Another option: it looks like I can assign arbitrary metadata with python-boto
where I could put the local modified date when I upload the first time.
Anyways, it seems like a common problem so I'd like to ask what solution others have used. Thanks!
The latest version of django-storages
(1.1.3) handles file modification detection through S3 Boto.
pip install django-storages
and you're good now :) Gotta love open source!
Update: set the AWS_PRELOAD_METADATA
option to True
in your settings file to have very fast syncs if using the S3Boto class. If using his S3, use his PreloadedS3 class.
Update 2: It's still extremely slow to run the command.
Update 3: I forked the django-storages repository to fix the issue and added a pull request.
The problem is in the modified_time
method where the fallback value is being called even if it's not being used. I moved the fallback to an if
block to be executed only if get
returns None
entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name)))
Should be
entry = self.entries.get(name)
if entry is None:
entry = self.bucket.get_key(self._encode_name(name))
Now the difference in performance is from <.5s for 1000 requests from 100s
For synching 10k+ files, I believe boto has to make multiple requests since S3 paginates results causing a 5-10 second synch time. This will only get worse as we get more files.
I'm thinking a solution is to have a custom management command or django-storages
update where a file is stored on S3 which has the metadata of all other files, which is updated any time a file is updated via the collectstatic
command.
It won't detect files uploaded via other means but won't matter if the sole entry point is the management command.
I answered the same question here https://stackoverflow.com/a/17528513/1220706 . Check out https://github.com/FundedByMe/collectfast . It's a pluggable Django app that caches the ETag of remote S3 files and compares the cached checksum instead of performing a lookup every time. Follow the installation instructions and run collectstatic
as normal. It took me from an average around 1m30s to about 10s per deploy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With