Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django equivalent of sum+case

I have a bunch of django models

class ReviewItem(Model):
  review = models.ForegnKey("Review")
  person = models.ForeignKey("Person")
  category = models.ForeignKey("Category")
  item = models.ForeignKey("item")
  reviewed = models.DateTimeField(null=True)

class Person(Model):
  name = models.CharField(max_length=255)

class Category(Model):
  name = models.CharField(max_length=127)

class Item(Model):
  name = models.CharField(max_length=127)
  category = models.ForeignKey("Category")

(As you can see, the "Category" fk in ReviewItem is redundant)

There will be at most NxM ReviewItem records where N is the number of people and M is the number of items that they might have assigned to them, and they will have their "reviewed" date set after they've been reviewed. Items are grouped into categories.

What I want is a count of how many items for each item have been reviewed and how many have not. In SQL, I could do

select category.name, item.name,
sum(case when reviewed is null then 1 else 0 end) as un_reviewed
sum(case when reviewed is null then 0 else 1 end) as reviewed
from reviewitem
join  category on category.id = reviewitem.category_id
join  item on item.id = reviewitem.item_id
group by category.id, item.id
order by category.name, item.name

I can't figure out how to do this without doing two separate QuerySets in django.

Doing it with two QuerySets, I ended up with:

uncompleted_items = Item.objects.filter(
    reviewitem__review=current_review,
    reviewitem__person__reports_to=eff_user,
    reviewitem__reviewed__isnull=True
).select_related(
    'category',
).annotate(num_uncompleted=Count('reviewitem'))

completed_items = Item.objects.filter(
    reviewitem__review=current_review,
    reviewitem__person__reports_to=eff_user,
    reviewitem__reviewed__isnull=False
).select_related(
    'category',
).annotate(num_completed=Count('reviewitem'))
like image 735
Paul Tomblin Avatar asked Jun 24 '13 18:06

Paul Tomblin


4 Answers

With Django 1.8, this can be done using Conditional Expressions.

So your example query might look like this:

from django.db.models import When, Case, Sum, IntegerField

items = Item.objects.annotate(
    un_reviewed=Sum(Case(When(reviewed__isnull=True, then=1)
                         When(reviewed__isnull=False, then=0),
                         output_field=IntegerField())),
    reviewed=Sum(Case(When(reviewed__isnull=True, then=0)
                      When(reviewed__isnull=False, then=1),
                      output_field=IntegerField())))
like image 131
congusbongus Avatar answered Oct 21 '22 15:10

congusbongus


Although it's less than convenient to do so, it's not impossible with the Django ORM to do this :)

Ok, the hacky solution didn't work so here's the pretty solution ;)

from django.db import models
from test_models.models import ReviewItem


class CountNullSql(models.sql.aggregates.Count):
    sql_template = '%(function)s(%(distinct)s%(field)s IS NULL)'


class CountNotNullSql(CountNullSql):
    sql_template = '%(function)s(%(distinct)s%(field)s IS NOT NULL)'


class CountNull(models.Count):
    sql = CountNullSql

    def add_to_query(self, query, alias, col, source, is_summary):
        aggregate = self.sql(
            col,
            source=source,
            is_summary=is_summary,
            **self.extra)
        query.aggregates[alias] = aggregate

    def _default_alias(self):
        return '%s__%s' % (self.lookup, self.sql.__class__.__name__.lower())

    default_alias = property(_default_alias)


class CountNotNull(CountNull):
    sql = CountNotNullSql


items = (ReviewItem.objects
    .values(
        'item__category__name',
        'item__name',
    ).annotate(
        unreviewed=CountNull('reviewed'),
        reviewed=CountNotNull('reviewed'),
    )
)


# Just debug stuff from here on, might be useful for others :)    
sql, params = items.query.sql_with_params()

try:
    import sqlparse
    sql = sqlparse.format(sql, reindent=True)
except ImportError:
    pass

try:
    from pygments import highlight
    from pygments.lexers import SqlLexer
    from pygments.formatters import Terminal256Formatter
    sql = highlight(sql, SqlLexer(), Terminal256Formatter(style='colorful'))
except ImportError:
    pass

print sql

The resulting query:

SELECT "test_models_category"."name",
       "test_models_item"."name",
       COUNT("test_models_reviewitem"."reviewed" IS NULL) AS "unreviewed",
       COUNT("test_models_reviewitem"."reviewed" IS NOT NULL) AS "reviewed"
FROM "test_models_reviewitem"
INNER JOIN "test_models_item" ON ("test_models_reviewitem"."item_id" = "test_models_item"."id")
INNER JOIN "test_models_category" ON ("test_models_item"."category_id" = "test_models_category"."id")
GROUP BY "test_models_category"."name",
         "test_models_item"."name"

Example result:

[{'item__category__name': u'cat a',
  'item__name': u'aa',
  'reviewed': 1,
  'unreviewed': 1},
 {'item__category__name': u'cat a',
  'item__name': u'ab',
  'reviewed': 1,
  'unreviewed': 1},
 {'item__category__name': u'cat b',
  'item__name': u'ba',
  'reviewed': 1,
  'unreviewed': 1},
 {'item__category__name': u'cat b',
  'item__name': u'bb',
  'reviewed': 1,
  'unreviewed': 1}]

Update from Paul Tomblin The CountNull and CountNotNull methods as described didn't work (evidently booleans count as 1 whether they're true or false), so I changed them as follows:

from django.db import models


class CountNullSql(models.sql.aggregates.Sum):
    sql_template = '%(function)s((%(field)s IS NULL)::integer)'


class CountNotNullSql(CountNullSql):
    sql_template = '%(function)s((%(field)s IS NOT NULL)::integer)'


class CountNull(models.Sum):
    sql = CountNullSql

    def add_to_query(self, query, alias, col, source, is_summary):
        aggregate = self.sql(
            col,
            source=source,
            is_summary=is_summary,
            **self.extra)
        query.aggregates[alias] = aggregate

    def _default_alias(self):
        return '%s__%s' % (self.lookup, self.sql.__class__.__name__.lower())

    default_alias = property(_default_alias)


class CountNotNull(CountNull):
    sql = CountNotNullSql
like image 25
Wolph Avatar answered Oct 21 '22 14:10

Wolph


This can be done directly with in-built support in Django 1.8. See https://docs.djangoproject.com/en/dev/ref/models/conditional-expressions/.

like image 3
akaariai Avatar answered Oct 21 '22 16:10

akaariai


Try this:

params = {'ri_table': ReviewItem._meta.db_table, 'i_table': Item._meta.db_table}
reviewed_query = 'SELECT COUNT(*) FROM %(ri_table)s WHERE %(ri_table)s.item_id=%(i_table)s.id AND reviewed IS NOT NULL' % params
unreviewed_query = 'SELECT COUNT(*) FROM %(ri_table)s WHERE %(ri_table)s.item_id=%(i_table)s.id AND reviewed IS NULL' % params
items = Item.objects.extra(select={'reviewed': reviewed_query, 'unreviewed': unreviewed_query})

This returns a queryset of Items that are essentially annotated with the count of reviews and unreviews. Each Item is a model object so from there you can get the Category and so on. print items.query will show you the SQL statement, but of course further Django magic happens to build the Model instance objects.

Collecting all this data in a single query is not elegant with the Django ORM (to be fair, it's also non-trivial SQL). It's complex enough that I don't think it's worth doing, at least not initially. Simpler code in the Django ORM paradigm will be easier to understand, faster to code and more maintainable. It may (or may not) require more hits to the database but such worry may be premature optimization. Such things as schema normalization may net you greater performance gains. Depending on your use, grabbing the counts like this may actually be faster.

reviewed = item_obj.reviewitem_set.exclude(reviewed__isnull=True).count()
unreviewed = item_obj.reviewitem_set.filter(reviewed__isnull=True).count()
like image 1
JCotton Avatar answered Oct 21 '22 15:10

JCotton