I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database. What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell: <pre class="prettyprint"><code>>>> MyModel.objects.count() 885 # (Here I added some more data from another process.) >>> MyModel.objects.count() 885 >>> MyModel.objects.update() 0 >>> MyModel.objects.count() 1025 </code></pre> As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem. I can't find any documentation on that update() method and have no idea what other bad things it might do. My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start. we used the <code>@transaction.commit_manually</code> decorator and calls to <code>transaction.commit()</code> just before every occasion where you need up-to-date info. As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view. detailed info here: http://devblog.resolversystems.com/?p=439

How do I force Django to ignore any caches and reload data?

Tags:

python

caching

django

I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.

What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:

>>> MyModel.objects.count() 885 # (Here I added some more data from another process.) >>> MyModel.objects.count() 885 >>> MyModel.objects.update() 0 >>> MyModel.objects.count() 1025

As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.

I can't find any documentation on that update() method and have no idea what other bad things it might do.

My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

811

asked Jul 27 '10 17:07

scippy

2 Answers

Having had this problem and found two definitive solutions for it I thought it worth posting another answer.

This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.

Demonstrate like this

Run a django shell in terminal 1

>>> MyModel.objects.get(id=1).my_field u'old'

And another in terminal 2

>>> MyModel.objects.get(id=1).my_field u'old' >>> a = MyModel.objects.get(id=1) >>> a.my_field = "NEW" >>> a.save() >>> MyModel.objects.get(id=1).my_field u'NEW' >>>

Back to terminal 1 to demonstrate the problem - we still read the old value from the database.

>>> MyModel.objects.get(id=1).my_field u'old'

Now in terminal 1 demonstrate the solution

>>> from django.db import transaction >>>  >>> @transaction.commit_manually ... def flush_transaction(): ...     transaction.commit() ...  >>> MyModel.objects.get(id=1).my_field u'old' >>> flush_transaction() >>> MyModel.objects.get(id=1).my_field u'NEW' >>>

The new data is now read

Here is that code in an easy to paste block with docstring

from django.db import transaction  @transaction.commit_manually def flush_transaction():     """     Flush the current transaction so we don't read stale data      Use in long running processes to make sure fresh data is read from     the database.  This is a problem with MySQL and the default     transaction mode.  You can fix it by setting     "transaction-isolation = READ-COMMITTED" in my.cnf or by calling     this function at the appropriate moment     """     transaction.commit()

The alternative solution is to change my.cnf for MySQL to change the default transaction mode

transaction-isolation = READ-COMMITTED

Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.

Update 3 years later

Now that Django 1.6 has turned on autocommit in MySQL this is no longer a problem. The example above now works fine without the flush_transaction() code whether your MySQL is in REPEATABLE-READ (the default) or READ-COMMITTED transaction isolation mode.

What was happening in previous versions of Django which ran in non autocommit mode was that the first select statement opened a transaction. Since MySQL's default mode is REPEATABLE-READ this means that no updates to the database will be read by subsequent select statements - hence the need for the flush_transaction() code above which stops the transaction and starts a new one.

There are still reasons why you might want to use READ-COMMITTED transaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED.

The flush_transaction() code now produces a deprecation warning in Django 1.6 so I recommend you remove it.

147

answered Oct 08 '22 09:10

Nick Craig-Wood

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.

we used the @transaction.commit_manually decorator and calls to transaction.commit() just before every occasion where you need up-to-date info.

As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.

detailed info here:

http://devblog.resolversystems.com/?p=439

answered Oct 08 '22 10:10

hwjp

Related questions
                            
                                Is virtualenv recommended for django production server? [closed]
                            
                                How to dynamically change base class of instances at runtime?
                            
                                Does JavaScript support array/list comprehensions like Python?
                            
                                Why would I put code in __init__.py files?
                            
                                How do I type a floating point infinity literal in python
                            
                                Why is there no first(iterable) built-in function in Python?
                            
                                How to test or mock "if __name__ == '__main__'" contents
                            
                                module has no attribute
                            
                                what are all the dtypes that pandas recognizes?
                            
                                Handling Variable Number of Columns with Pandas - Python
                            
                                What is the underlying data structure for Python lists?
                            
                                Meaning of inter_op_parallelism_threads and intra_op_parallelism_threads
                            
                                Pyspark: Split multiple array columns into rows
                            
                                Return Pandas dataframe from PostgreSQL query with sqlalchemy
                            
                                Using self.xxxx as a default parameter - Python
                            
                                auto.arima() equivalent for python
                            
                                Blocking and Non Blocking subprocess calls
                            
                                Combine awaitables like Promise.all
                            
                                So what exactly does “from __future__ import barry_as_FLUFL” do?
                            
                                Upload files in Google App Engine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With