Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improved query performance in Django admin search on related fields (MySQL)

In Django I have this:

models.py

class Book(models.Model):
    isbn = models.CharField(max_length=16, db_index=True)
    title = models.CharField(max_length=255, db_index=True)
    ... other fields ...

class Author(models.Model):
    first_name = models.CharField(max_length=128, db_index=True)
    last_name = models.CharField(max_length=128, db_index=True)
    books = models.ManyToManyField(Book, blank=True)
    ... other fields ...

admin.py

class AuthorAdmin(admin.ModelAdmin):
    search_fields = ('first_name', 'last_name', 'books__isbn', 'books__title')

    ...

My problem is when I do a search from the Author admin list page with 2 or more short terms, MySQL start to take a lot of time (at least 8 sec. for a 3 terms query). I have around 5000 Authors and 2500 Books. The short here is very important. If I search for 'a b c', so 3 really short terms, I'm not enough patient to wait for the result (I waited at least 2 min.). Instead if I search for 'all bee clue' I got the result in 2 sec. So the problem look the be really with short terms on related fields.

The SQL query resulting from this search have a lot of JOIN, LIKE, AND and OR but no subquery.

I'm using MySQL 5.1 but I tried with 5.5 with no more success.

I also tried to increase the innodb_buffer_pool_size to a really large value. That change nothing.

The only idea I have right now to improve the performance is to denormalize to isbn and title field (ie copy them directly into Authors) but I will have to add a bunch of mechanics to keep these fields in sync with the real ones in Book.

Any suggestions on how to improve this query?

like image 930
Etienne Avatar asked Dec 28 '22 04:12

Etienne


1 Answers

After a lot of investigations I found that the problem come from how the search query is built for the admin search field (in the ChangeList class). In a multi-terms search (words separated by space) each term is added to the QuerySet by chaining a new filter(). When there's one or more related fields in the search_fields, the created SQL query will have a lot of JOIN chained one after the other with many JOIN for each related field (see my related question for some examples and more info). This chain of JOIN is there so that each term will be search only in the subset of data filter by the precedent term AND, most important, that a related field need to only have one term (vs needing to have ALL terms) to make a match. See Spanning multi-valued relationships in the Django docs for more info on this subject. I'm pretty sure it's the behavior wanted most of the time for the admin search field.

The drawback of this query (with related fields involved) is that the variation in performance (time to perform the query) can be really large. It depends on a lot of factors: number of searched terms, terms searched, kind of field search (VARCHAR, etc.), number of field search, data in the tables, size of the tables, etc. With the right combination it's easy to have a query that will take mostly forever (a query that take more then 10 min. for me is a query that take forever in the context of this search field).

The reason why it can take so long is that the database need to create a temporary table for each term and scan it mostly entirely to search for the next term. So, this adds up really quickly.

A possible change to do to improve the performance is to ANDed all terms in the same filter(). This way their will be only one JOIN by related field (or 2 if it's a many to many) instead of many more. This query will be a lot faster and with really small performance variation. The drawback is that related fields will have to have ALL the terms to match, so, you can get less matches in many cases.

UPDATE

As asked by trinchet here’s what’s needed to do the change of search behavior (for Django 1.7). You need to override the get_search_results() of the admin classes where you want this kind of search. You need to copy all the method code from the base class (ModelAdmin) to your own class. Then you need to change those lines:

for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    queryset = queryset.filter(reduce(operator.or_, or_queries))

To that:

and_queries = []
for bit in search_term.split():
    or_queries = [models.Q(**{orm_lookup: bit})
                  for orm_lookup in orm_lookups]
    and_queries.append(Q(reduce(operator.or_, or_queries)))
queryset = queryset.filter(reduce(operator.and_, and_queries))

This code is not tested. My original code was for Django 1.4 and I just adapt it for 1.7 here.

like image 85
Etienne Avatar answered Jan 29 '23 22:01

Etienne