I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.
What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:
>>> MyModel.objects.count() 885 # (Here I added some more data from another process.) >>> MyModel.objects.count() 885 >>> MyModel.objects.update() 0 >>> MyModel.objects.count() 1025
As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.
I can't find any documentation on that update() method and have no idea what other bad things it might do.
My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?
Django has a default caching system in the form of local-memory caching. It is very powerful and robust. This system can handle multi-threaded processes and is efficient. It is best for those projects which cannot use Memcached framework.
To use cache in Django, first thing to do is to set up where the cache will stay. The cache framework offers different possibilities - cache can be saved in database, on file system or directly in memory. Setting is done in the settings.py file of your project.
This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.
Having had this problem and found two definitive solutions for it I thought it worth posting another answer.
This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.
Demonstrate like this
Run a django shell in terminal 1
>>> MyModel.objects.get(id=1).my_field u'old'
And another in terminal 2
>>> MyModel.objects.get(id=1).my_field u'old' >>> a = MyModel.objects.get(id=1) >>> a.my_field = "NEW" >>> a.save() >>> MyModel.objects.get(id=1).my_field u'NEW' >>>
Back to terminal 1 to demonstrate the problem - we still read the old value from the database.
>>> MyModel.objects.get(id=1).my_field u'old'
Now in terminal 1 demonstrate the solution
>>> from django.db import transaction >>> >>> @transaction.commit_manually ... def flush_transaction(): ... transaction.commit() ... >>> MyModel.objects.get(id=1).my_field u'old' >>> flush_transaction() >>> MyModel.objects.get(id=1).my_field u'NEW' >>>
The new data is now read
Here is that code in an easy to paste block with docstring
from django.db import transaction @transaction.commit_manually def flush_transaction(): """ Flush the current transaction so we don't read stale data Use in long running processes to make sure fresh data is read from the database. This is a problem with MySQL and the default transaction mode. You can fix it by setting "transaction-isolation = READ-COMMITTED" in my.cnf or by calling this function at the appropriate moment """ transaction.commit()
The alternative solution is to change my.cnf for MySQL to change the default transaction mode
transaction-isolation = READ-COMMITTED
Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.
Update 3 years later
Now that Django 1.6 has turned on autocommit in MySQL this is no longer a problem. The example above now works fine without the flush_transaction()
code whether your MySQL is in REPEATABLE-READ
(the default) or READ-COMMITTED
transaction isolation mode.
What was happening in previous versions of Django which ran in non autocommit mode was that the first select
statement opened a transaction. Since MySQL's default mode is REPEATABLE-READ
this means that no updates to the database will be read by subsequent select
statements - hence the need for the flush_transaction()
code above which stops the transaction and starts a new one.
There are still reasons why you might want to use READ-COMMITTED
transaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED
.
The flush_transaction()
code now produces a deprecation warning in Django 1.6 so I recommend you remove it.
We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.
we used the @transaction.commit_manually
decorator and calls to transaction.commit()
just before every occasion where you need up-to-date info.
As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.
detailed info here:
http://devblog.resolversystems.com/?p=439
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With