I have a bunch of django models
class ReviewItem(Model):
review = models.ForegnKey("Review")
person = models.ForeignKey("Person")
category = models.ForeignKey("Category")
item = models.ForeignKey("item")
reviewed = models.DateTimeField(null=True)
class Person(Model):
name = models.CharField(max_length=255)
class Category(Model):
name = models.CharField(max_length=127)
class Item(Model):
name = models.CharField(max_length=127)
category = models.ForeignKey("Category")
(As you can see, the "Category" fk in ReviewItem is redundant)
There will be at most NxM ReviewItem records where N is the number of people and M is the number of items that they might have assigned to them, and they will have their "reviewed" date set after they've been reviewed. Items are grouped into categories.
What I want is a count of how many items for each item have been reviewed and how many have not. In SQL, I could do
select category.name, item.name,
sum(case when reviewed is null then 1 else 0 end) as un_reviewed
sum(case when reviewed is null then 0 else 1 end) as reviewed
from reviewitem
join category on category.id = reviewitem.category_id
join item on item.id = reviewitem.item_id
group by category.id, item.id
order by category.name, item.name
I can't figure out how to do this without doing two separate QuerySets in django.
Doing it with two QuerySets, I ended up with:
uncompleted_items = Item.objects.filter(
reviewitem__review=current_review,
reviewitem__person__reports_to=eff_user,
reviewitem__reviewed__isnull=True
).select_related(
'category',
).annotate(num_uncompleted=Count('reviewitem'))
completed_items = Item.objects.filter(
reviewitem__review=current_review,
reviewitem__person__reports_to=eff_user,
reviewitem__reviewed__isnull=False
).select_related(
'category',
).annotate(num_completed=Count('reviewitem'))
With Django 1.8, this can be done using Conditional Expressions.
So your example query might look like this:
from django.db.models import When, Case, Sum, IntegerField
items = Item.objects.annotate(
un_reviewed=Sum(Case(When(reviewed__isnull=True, then=1)
When(reviewed__isnull=False, then=0),
output_field=IntegerField())),
reviewed=Sum(Case(When(reviewed__isnull=True, then=0)
When(reviewed__isnull=False, then=1),
output_field=IntegerField())))
Although it's less than convenient to do so, it's not impossible with the Django ORM to do this :)
Ok, the hacky solution didn't work so here's the pretty solution ;)
from django.db import models
from test_models.models import ReviewItem
class CountNullSql(models.sql.aggregates.Count):
sql_template = '%(function)s(%(distinct)s%(field)s IS NULL)'
class CountNotNullSql(CountNullSql):
sql_template = '%(function)s(%(distinct)s%(field)s IS NOT NULL)'
class CountNull(models.Count):
sql = CountNullSql
def add_to_query(self, query, alias, col, source, is_summary):
aggregate = self.sql(
col,
source=source,
is_summary=is_summary,
**self.extra)
query.aggregates[alias] = aggregate
def _default_alias(self):
return '%s__%s' % (self.lookup, self.sql.__class__.__name__.lower())
default_alias = property(_default_alias)
class CountNotNull(CountNull):
sql = CountNotNullSql
items = (ReviewItem.objects
.values(
'item__category__name',
'item__name',
).annotate(
unreviewed=CountNull('reviewed'),
reviewed=CountNotNull('reviewed'),
)
)
# Just debug stuff from here on, might be useful for others :)
sql, params = items.query.sql_with_params()
try:
import sqlparse
sql = sqlparse.format(sql, reindent=True)
except ImportError:
pass
try:
from pygments import highlight
from pygments.lexers import SqlLexer
from pygments.formatters import Terminal256Formatter
sql = highlight(sql, SqlLexer(), Terminal256Formatter(style='colorful'))
except ImportError:
pass
print sql
The resulting query:
SELECT "test_models_category"."name",
"test_models_item"."name",
COUNT("test_models_reviewitem"."reviewed" IS NULL) AS "unreviewed",
COUNT("test_models_reviewitem"."reviewed" IS NOT NULL) AS "reviewed"
FROM "test_models_reviewitem"
INNER JOIN "test_models_item" ON ("test_models_reviewitem"."item_id" = "test_models_item"."id")
INNER JOIN "test_models_category" ON ("test_models_item"."category_id" = "test_models_category"."id")
GROUP BY "test_models_category"."name",
"test_models_item"."name"
Example result:
[{'item__category__name': u'cat a',
'item__name': u'aa',
'reviewed': 1,
'unreviewed': 1},
{'item__category__name': u'cat a',
'item__name': u'ab',
'reviewed': 1,
'unreviewed': 1},
{'item__category__name': u'cat b',
'item__name': u'ba',
'reviewed': 1,
'unreviewed': 1},
{'item__category__name': u'cat b',
'item__name': u'bb',
'reviewed': 1,
'unreviewed': 1}]
Update from Paul Tomblin The CountNull and CountNotNull methods as described didn't work (evidently booleans count as 1 whether they're true or false), so I changed them as follows:
from django.db import models
class CountNullSql(models.sql.aggregates.Sum):
sql_template = '%(function)s((%(field)s IS NULL)::integer)'
class CountNotNullSql(CountNullSql):
sql_template = '%(function)s((%(field)s IS NOT NULL)::integer)'
class CountNull(models.Sum):
sql = CountNullSql
def add_to_query(self, query, alias, col, source, is_summary):
aggregate = self.sql(
col,
source=source,
is_summary=is_summary,
**self.extra)
query.aggregates[alias] = aggregate
def _default_alias(self):
return '%s__%s' % (self.lookup, self.sql.__class__.__name__.lower())
default_alias = property(_default_alias)
class CountNotNull(CountNull):
sql = CountNotNullSql
This can be done directly with in-built support in Django 1.8. See https://docs.djangoproject.com/en/dev/ref/models/conditional-expressions/.
Try this:
params = {'ri_table': ReviewItem._meta.db_table, 'i_table': Item._meta.db_table}
reviewed_query = 'SELECT COUNT(*) FROM %(ri_table)s WHERE %(ri_table)s.item_id=%(i_table)s.id AND reviewed IS NOT NULL' % params
unreviewed_query = 'SELECT COUNT(*) FROM %(ri_table)s WHERE %(ri_table)s.item_id=%(i_table)s.id AND reviewed IS NULL' % params
items = Item.objects.extra(select={'reviewed': reviewed_query, 'unreviewed': unreviewed_query})
This returns a queryset of Items that are essentially annotated with the count of reviews and unreviews. Each Item is a model object so from there you can get the Category and so on. print items.query
will show you the SQL statement, but of course further Django magic happens to build the Model instance objects.
Collecting all this data in a single query is not elegant with the Django ORM (to be fair, it's also non-trivial SQL). It's complex enough that I don't think it's worth doing, at least not initially. Simpler code in the Django ORM paradigm will be easier to understand, faster to code and more maintainable. It may (or may not) require more hits to the database but such worry may be premature optimization. Such things as schema normalization may net you greater performance gains. Depending on your use, grabbing the counts like this may actually be faster.
reviewed = item_obj.reviewitem_set.exclude(reviewed__isnull=True).count()
unreviewed = item_obj.reviewitem_set.filter(reviewed__isnull=True).count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With