This is from the django docs on the queryset iterator()
method:
A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.
After reading, I'm still confused: The line about increased performance and memory reduction suggests we should just use the iterator()
method. Can someone give some examples of good and bad cases iterator()
usage?
Even if the query results are not cached, if they really wanted to access the models more than once, can't someone just do the following?
saved_queries = list(Model.objects.all().iterator())
Django's built-in solution to iterating though a larger QuerySet is the QuerySet. iterator method. This helps immensely and is probably good enough in most cases.
This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.
The Django web framework includes a default object-relational mapping layer (ORM) that can be used to interact with application data from various relational databases such as SQLite, PostgreSQL and MySQL. The Django ORM is an implementation of the object-relational mapping (ORM) concept.
Note the first part of the sentence you call out: For a QuerySet which returns a large number of objects that you only need to access once
So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator
. Because the extra database round trip is always going to reduce your performance vs. using the cached result.
You could force your QuerySet to be evaluated into a list but:
saved_queries = Model.objects.all()
QuerySet
s are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet
behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.
Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.
I agree with Steven and I would like to had an observation:
"it requires more typing than just saved_queries = Model.objects.all()". Yes it does but there is a major difference why you should use list(Model.objects.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let's imagine you have +1M records, so that means, you will have +1M records in a list that you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won't execute until you call the variable, saving you DB calls.
You should use the prefetch_related() to save you from doing too many calls into a Database and therefore, it will use the Django reverse lookup to help you and save you tons of time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With