I'm having an odd problem using <code>itertools.groupby</code> to group the elements of a queryset. I have a model <code>Resource</code>: <pre class="prettyprint"><code>from django.db import models TYPE_CHOICES = ( ('event', 'Event Room'), ('meet', 'Meeting Room'), # etc ) class Resource(models.Model): name = models.CharField(max_length=30) type = models.CharField(max_length=5, choices=TYPE_CHOICES) # other stuff </code></pre> I have a couple of resources in my sqlite database: <pre class="prettyprint"><code>>>> from myapp.models import Resource >>> r = Resource.objects.all() >>> len(r) 3 >>> r[0].type u'event' >>> r[1].type u'meet' >>> r[2].type u'meet' </code></pre> So if I group by type, I naturally get two tuples: <pre class="prettyprint"><code>>>> from itertools import groupby >>> g = groupby(r, lambda resource: resource.type) >>> for type, resources in g: ... print type ... for resource in resources: ... print '\t%s' % resource event resourcex meet resourcey resourcez </code></pre> Now I have the same logic in my view: <pre class="prettyprint"><code>class DayView(DayArchiveView): def get_context_data(self, *args, **kwargs): context = super(DayView, self).get_context_data(*args, **kwargs) types = dict(TYPE_CHOICES) context['resource_list'] = groupby(Resource.objects.all(), lambda r: types[r.type]) return context </code></pre> But when I iterate over this in my template, some resources are missing: <pre class="prettyprint"><code><select multiple="multiple" name="resources"> {% for type, resources in resource_list %} <option disabled="disabled">{{ type }}</option> {% for resource in resources %} <option value="{{ resource.id }}">{{ resource.name }}</option> {% endfor %} {% endfor %} </select> </code></pre> This renders as: <img src="https://i.stack.imgur.com/k15h8.png" alt="select multiple"> I'm thinking somehow the subiterators are being iterated over already, but I'm not sure how this could happen. (Using python 2.7.1, Django 1.3). (EDIT: If anyone reads this, I'd recommend using the built-in <code>regroup</code> template tag instead of using <code>groupby</code>.)

I think that you're right. I don't understand why, but it looks to me like your <code>groupby</code> iterator is being pre-iterated. It's easier to explain with code: <pre class="prettyprint"><code>>>> even_odd_key = lambda x: x % 2 >>> evens_odds = sorted(range(10), key=even_odd_key) >>> evens_odds_grouped = itertools.groupby(evens_odds, key=even_odd_key) >>> [(k, list(g)) for k, g in evens_odds_grouped] [(0, [0, 2, 4, 6, 8]), (1, [1, 3, 5, 7, 9])] </code></pre> So far, so good. But what happens when we try to store the contents of the iterator in a list? <pre class="prettyprint"><code>>>> evens_odds_grouped = itertools.groupby(evens_odds, key=even_odd_key) >>> groups = [(k, g) for k, g in evens_odds_grouped] >>> groups [(0, <itertools._grouper object at 0x1004d7110>), (1, <itertools._grouper object at 0x1004ccbd0>)] </code></pre> Surely we've just cached the results, and the iterators are still good. Right? Wrong. <pre class="prettyprint"><code>>>> [(k, list(g)) for k, g in groups] [(0, []), (1, [9])] </code></pre> In the process of acquiring the keys, the groups are also iterated over. So we've really just cached the keys and thrown the groups away, save the very last item. I don't know how django handles iterators, but based on this, my hunch is that it caches them as lists internally. You could at least partially confirm this intuition by doing the above, but with more resources. If the only resource displayed is the last one, then you are almost certainly having the above problem somewhere.

itertools.groupby in a django template

Tags:

python

group-by

django

itertools

django-templates

I'm having an odd problem using itertools.groupby to group the elements of a queryset. I have a model Resource:

from django.db import models 

TYPE_CHOICES = ( 
    ('event', 'Event Room'),
    ('meet', 'Meeting Room'),
    # etc 
)   

class Resource(models.Model):
    name = models.CharField(max_length=30)
    type = models.CharField(max_length=5, choices=TYPE_CHOICES)
    # other stuff

I have a couple of resources in my sqlite database:

>>> from myapp.models import Resource
>>> r = Resource.objects.all()
>>> len(r)
3
>>> r[0].type
u'event'
>>> r[1].type
u'meet'
>>> r[2].type
u'meet'

So if I group by type, I naturally get two tuples:

>>> from itertools import groupby
>>> g = groupby(r, lambda resource: resource.type)
>>> for type, resources in g:
...   print type
...   for resource in resources:
...     print '\t%s' % resource
event
    resourcex
meet
    resourcey
    resourcez

Now I have the same logic in my view:

class DayView(DayArchiveView):
    def get_context_data(self, *args, **kwargs):
        context = super(DayView, self).get_context_data(*args, **kwargs)
        types = dict(TYPE_CHOICES)
        context['resource_list'] = groupby(Resource.objects.all(), lambda r: types[r.type])
        return context

But when I iterate over this in my template, some resources are missing:

<select multiple="multiple" name="resources">
{% for type, resources in resource_list %}
    <option disabled="disabled">{{ type }}</option>
    {% for resource in resources %}
        <option value="{{ resource.id }}">{{ resource.name }}</option>
    {% endfor %}
{% endfor %}
</select>

This renders as:

select multiple

I'm thinking somehow the subiterators are being iterated over already, but I'm not sure how this could happen.

(Using python 2.7.1, Django 1.3).

(EDIT: If anyone reads this, I'd recommend using the built-in regroup template tag instead of using groupby.)

961

asked Aug 02 '11 02:08

Ismail Badawi

2 Answers

Django's templates want to know the length of things that are looped over using {% for %}, but generators don't have a length.

So Django decides to convert it to a list before iterating, so that it has access to a list.

This breaks generators created using itertools.groupby. If you don't iterate through each group, you lose the contents. Here is an example from Django core developer Alex Gaynor, first the normal groupby:

>>> groups = itertools.groupby(range(10), lambda x: x < 5)
>>> print [list(items) for g, items in groups]
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

Here is what Django does; it converts the generator to a list:

>>> groups = itertools.groupby(range(10), lambda x: x < 5)
>>> groups = list(groups)
>>> print [list(items) for g, items in groups]
[[], [9]]

There are two ways around this: convert to a list before Django does or prevent Django from doing it.

Converting into a list yourself

As shown above:

[(grouper, list(values)) for grouper, values in my_groupby_generator]

But of course, you no longer have the advantages of using a generator, if this is an issue for you.

Preventing Django from converting into a list

The other way around this is to wrap it in an object that provides a __len__ method (if you know what the length will be):

class MyGroupedItems(object):
    def __iter__(self):
        return itertools.groupby(range(10), lambda x: x < 5)

    def __len__(self):
        return 2

Django will be able to get the length using len() and will not need to convert your generator into a list. It's unfortunate that Django does this. I was lucky that I could use this workaround, as I was already using such an object and knew what the length would always be.

184

answered Oct 30 '22 01:10

Will Hardy

I think that you're right. I don't understand why, but it looks to me like your groupby iterator is being pre-iterated. It's easier to explain with code:

>>> even_odd_key = lambda x: x % 2
>>> evens_odds = sorted(range(10), key=even_odd_key)
>>> evens_odds_grouped = itertools.groupby(evens_odds, key=even_odd_key)
>>> [(k, list(g)) for k, g in evens_odds_grouped]
[(0, [0, 2, 4, 6, 8]), (1, [1, 3, 5, 7, 9])]

So far, so good. But what happens when we try to store the contents of the iterator in a list?

>>> evens_odds_grouped = itertools.groupby(evens_odds, key=even_odd_key)
>>> groups = [(k, g) for k, g in evens_odds_grouped]
>>> groups
[(0, <itertools._grouper object at 0x1004d7110>), (1, <itertools._grouper object at 0x1004ccbd0>)]

Surely we've just cached the results, and the iterators are still good. Right? Wrong.

>>> [(k, list(g)) for k, g in groups]
[(0, []), (1, [9])]

In the process of acquiring the keys, the groups are also iterated over. So we've really just cached the keys and thrown the groups away, save the very last item.

I don't know how django handles iterators, but based on this, my hunch is that it caches them as lists internally. You could at least partially confirm this intuition by doing the above, but with more resources. If the only resource displayed is the last one, then you are almost certainly having the above problem somewhere.

answered Oct 29 '22 23:10

senderle

Related questions
                            
                                Using python multiprocessing Pool in the terminal and in code modules for Django or Flask
                            
                                Python: Selenium with PhantomJS empty page source
                            
                                Change value in ini file using ConfigParser Python
                            
                                Log log plot linear regression
                            
                                Map object is not JSON serializable
                            
                                In python, how do I create a timezone aware datetime from a date and time?
                            
                                threshold in 2D numpy array
                            
                                Python SQLAlchemy: Data source name not found and no default driver specified
                            
                                python matplotlib update scatter plot from a function
                            
                                Plot negative values on a log scale
                            
                                How to map a function with additional parameter using the new Dataset api in TF1.3?
                            
                                Running a python script on Google Cloud Compute Engine
                            
                                How to access SparkContext from SparkSession instance?
                            
                                Add new rows to pyspark Dataframe
                            
                                Plotly saving multiple plots into a single html
                            
                                python: slow timeit() function
                            
                                python-like Java IO library?
                            
                                How to make an auto-filled and auto-incrementing field in django admin
                            
                                Python: can unittest display expected and actual values?
                            
                                filtering dropdown values in django admin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With