Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use or not use iterator() in the django ORM

This is from the django docs on the queryset iterator() method:

A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.

After reading, I'm still confused: The line about increased performance and memory reduction suggests we should just use the iterator() method. Can someone give some examples of good and bad cases iterator() usage?

Even if the query results are not cached, if they really wanted to access the models more than once, can't someone just do the following?

saved_queries = list(Model.objects.all().iterator()) 
like image 587
Lucas Ou-Yang Avatar asked Oct 01 '12 21:10

Lucas Ou-Yang


People also ask

Is Django QuerySet iterator?

Django's built-in solution to iterating though a larger QuerySet is the QuerySet. iterator method. This helps immensely and is probably good enough in most cases.

Why Querysets are lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.

What does Django use for ORM?

The Django web framework includes a default object-relational mapping layer (ORM) that can be used to interact with application data from various relational databases such as SQLite, PostgreSQL and MySQL. The Django ORM is an implementation of the object-relational mapping (ORM) concept.


2 Answers

Note the first part of the sentence you call out: For a QuerySet which returns a large number of objects that you only need to access once

So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is always going to reduce your performance vs. using the cached result.

You could force your QuerySet to be evaluated into a list but:

  • it requires more typing than just saved_queries = Model.objects.all()
  • say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
  • QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request

The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.

Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.

like image 175
Steven Avatar answered Sep 29 '22 22:09

Steven


I agree with Steven and I would like to had an observation:

  • "it requires more typing than just saved_queries = Model.objects.all()". Yes it does but there is a major difference why you should use list(Model.objects.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let's imagine you have +1M records, so that means, you will have +1M records in a list that you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won't execute until you call the variable, saving you DB calls.

  • You should use the prefetch_related() to save you from doing too many calls into a Database and therefore, it will use the Django reverse lookup to help you and save you tons of time.

like image 35
Tiago Silva Avatar answered Sep 29 '22 20:09

Tiago Silva