Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to use filter() twice in Django

I am relatively new to Django and Python, but I have not been able to quite figure this one out.

I essentially want to query the database using filter for a large number of users. Then I want to make a bunch of queries on this just this section of users. So I thought it would be most efficient do first query for my larger filter parameters, and then make my separate filter queries on that set. In code, it looks like this

#Get the big groups of users, like all people with brown hair.
group_of_users  = Data.objects.filter(......) 

#Now get all the people with brown hair and blue eyes, and then all with green eyes, etc.
for each haircolor :
  subset_of_group = group_of_users.filter(....) 

That is just pseudo-code by the way, I am not that inept. I thought this would be more efficient, but it seems that if eliminate the first query and simply just get the querysets in the for loop, it is much faster (actually timed). I fear this is because when I filter first, and then filter each time in the for loop, it is actually doing both sets of filter queries on each for loop execution. So really, doing twice the amount of work I want. I thought with caching this would not matter, as the first filter results would be cached and it would still be faster, but again, I timed it with multiple tests and the single filter is faster. Any ideas?

EDIT: So it seems that querying for a set of data, and then trying to further query only against that set of data, is not possible. Rather, I should query for a set of data and then further parse that data using regular Python.

like image 331
Cole Canning Avatar asked Oct 19 '25 14:10

Cole Canning


1 Answers

As garnertb ans lanzz said, it doesn't matter where you use the filter function, the only thing that matters is when you evaluate the query (see when querysets are evaluated). My guess is that in your tests, you evaluate the queryset somewhere in your code, and that you do more evaluations in your test with separate filter calls.

Whenever a queryset is evaluated, its results are cached. However, this cache does not carry over if you use another method, such as filter or order_by, on the queryset. SO you can't try to evaluate the bigger set, and use filtering on the queryset to retrieve the smaller sets without doing another query.

If you only have a small set of haircolours, you can get away with doing a query for each haircolour. However, if you have many of them, the amount of queries will have a severe impact on performance. In that case it might be better to do a query for the full set of users you want to use, and the do subsequent processing in python:

qs = Data.objects.filter(hair='brown')
objects = dict()
for obj in qs:
   objects.setdefault(obj.haircolour, []).append(obj)

for (k, v) in objects.items():
    print "Objects for colour '%s':" % k
    for obj in v:
        print "- %s" % obj
like image 90
knbk Avatar answered Oct 22 '25 04:10

knbk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!