Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing Django Database Queries

I have very large dataset and growing, and I need to create many filters but it is going to quickly get out of control and was hoping someone can help me combine some of the queries into a single call. Below is the start of my view.

Call #1 - for loop to display table of all results

traffic = Traffic.objects.all()

Call #2 - Combined aggregate sum query

totals = Traffic.objects.aggregate(Sum('sessions'), Sum('new_users'), Sum('reminder'), Sum('campaigns'), Sum('new_sales'), Sum('sales_renewals'))
    total_sessions = totals.get('sessions__sum')
    total_new_users = totals.get('new_users__sum')
    total_reminder = totals.get('reminder__sum')
    total_campaigns = totals.get('campaigns__sum')
    total_new_sales = totals.get('new_sales__sum')
    total_sales_renewals = totals.get('sales_renewals__sum')

Call #3, #4, #5, #6 and so on... - To filter the database by month and day of week

total_sessions_2014_m = Traffic.objects.filter(created__year='2014', created__week_day=2).aggregate(Sum('sessions'))

total_sessions_2014_m = Traffic.objects.filter(created__year='2014', created__week_day=3).aggregate(Sum('sessions'))

total_sessions_2014_m = Traffic.objects.filter(created__year='2014', created__week_day=4).aggregate(Sum('sessions'))

total_sessions_2014_m = Traffic.objects.filter(created__year='2014', created__week_day=5).aggregate(Sum('sessions'))

total_sessions_2014_m = Traffic.objects.filter(created__year='2014', created__week_day=6).aggregate(Sum('sessions'))

The problem is , I need to create several dozen more filters because I have 3 years of data with multiple data points per column that we need totals the sum for.

Questions:

  1. Can I combine call #1 into call #2
  2. Can I use Call #2 to query the sums for call#3 so I don't have to call all objects from the database to filter it and then do this a couple more dozen times?

As you can see, this is going to get out of control very quickly. Any help would be hugely appreciated. Thank you.

Updated to add Traffic Model

class Timestamp(models.Model):
    created = models.DateField()

    class Meta:
        abstract = True


class Traffic(Timestamp):
    sessions = models.IntegerField(blank=True, null=True)
    new_users = models.IntegerField(blank=True, null=True)
    reminder = models.IntegerField(blank=True, null=True)
    campaigns = models.IntegerField(blank=True, null=True)
    new_sales = models.IntegerField(blank=True, null=True)
    sales_renewals = models.IntegerField(blank=True, null=True)

    # Meta and String
    class Meta:
        verbose_name = 'Traffic'
        verbose_name_plural = 'Traffic Data'

    def __str__(self):
        return "%s" % self.created
like image 918
Charles Smith Avatar asked Mar 23 '16 06:03

Charles Smith


People also ask

What is query optimization in Django?

The main goal of optimization in Django projects is to make it fast to perform database queries by ensuring that your projects run by making the best use of system resources. A properly optimized database will reduce the response time hence a better user experience.

How does Django handle large data?

Use bulk query. Use bulk queries to efficiently query large data sets and reduce the number of database requests. Django ORM can perform several inserts or update operations in a single SQL query. If you're planning on inserting more than 5000 objects, specify batch_size.

Why is Django query slow?

The database has less than 10 entries and an index on the primary key, so it is way too slow. When executing the an equivalent query on the db like SELECT * FROM scope WHERE title='Esoterik I'; everything is ok and it takes only about 50ms. The same problem happens if I query a set of results like scope_list = Scope.


1 Answers

There are dozens of ways to optimize your database queries with the Django ORM. As usual, the Django documentation is great and has a good list of them. Here's some quick tips for query optimization:

1) iterator()

If you are accessing the queryset only once. So for example you can use this as,

traffic = Traffic.objects.all()

for t in traffic.iterator():
    ...
    ...

2) db_index=True

While defining fields of your models. As the Django documentation says,

This is a number one priority, after you have determined from profiling what indexes should be added. Use Field.db_index or Meta.index_together to add these from Django. Consider adding indexes to fields that you frequently query using filter(), exclude(), order_by(), etc. as indexes may help to speed up lookups.

Hence you can modify your model as,

class Traffic(Timestamp):
    sessions = models.IntegerField(blank=True, null=True, db_index=True)
    new_users = models.IntegerField(blank=True, null=True, db_index=True)
    reminder = models.IntegerField(blank=True, null=True, db_index=True)
    campaigns = models.IntegerField(blank=True, null=True, db_index=True)
    new_sales = models.IntegerField(blank=True, null=True, db_index=True)

3) prefetch_related() or select_related()

If you have relations within your models, using prefetch_related or select_related would be a choice. As per the Django documentation,

select_related works by creating a SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a 'many' relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the 'joining' in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related.

select_related does a join, prefetch_related does two separate queries. Using these you can make your queries upto 30% faster.


4) Django Pagination

If your template design allows you to display results in multiple pages your can use Pagination.


5) Querysets are Lazy

You also need to understand that the Django Querysets are lazy which means that it won't query the database untill its being used/evaluated. A queryset in Django represents a number of rows in the database, optionally filtered by a query. For example,

traffic = Traffic.objects.all()

The above code doesn’t run any database queries. You can can take the traffic queryset and apply additional filters, or pass it to a function, and nothing will be sent to the database. This is good, because querying the database is one of the things that significantly slows down web applications. To fetch the data from the database, you need to iterate over the queryset:

for t in traffic.iterator():
    print(t.sessions)

6) django-debug-toolbar

Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/response and when clicked, display more details about the panel's content. This includes:

  • Request timer
  • SQL queries including time to execute and links to EXPLAIN each query

Modifying your code: (remember Querysets are Lazy)

traffic = Traffic.objects.all()
totals = traffic.aggregate(Sum('sessions'), Sum('new_users'), Sum('reminder'), Sum('campaigns'), Sum('new_sales'), Sum('sales_renewals'))
total_sessions = totals.get('sessions__sum')
total_new_users = totals.get('new_users__sum')
total_reminder = totals.get('reminder__sum')
total_campaigns = totals.get('campaigns__sum')
total_new_sales = totals.get('new_sales__sum')
total_sales_renewals = totals.get('sales_renewals__sum')

t_2014 = traffic.filter(created__year='2014')
t_sessions_2014_wd2 = t_2014.filter(created__week_day=2).aggregate(Sum('sessions'))
...
...

For Call #1 in template (for loop to display table of all results):

{% for t in traffic.iterator %}
    {{ t.sessions }}
    ...
    ...
{% endfor %}
like image 157
Parag Tyagi Avatar answered Oct 20 '22 09:10

Parag Tyagi