Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating efficient database queries for hierarchical models (django)

Tags:

django

Consider this (django) model:

class Source(models.Model):
   # Some other fields
   type = models.ForeignKey('Type')

class Type(models.Model):
    # Some other fields
    parent = models.ForeignKey('self')

This model has a foreign key to itself, thus creating a hierarchy.

Let's say we have the following hierarchy:

Website
    Blog
    News    
    Social Network
    Q&A
    Forum
Radio
    Government radio
    Pirate radio
    Commercial radio
    Internet radio

How do I efficiently query so that if I select Source by Type, I also retrieve Sources which have a Type that is a child of the given type?

I have tried traversing through the whole tree, but that isn't very efficient obviously.

Another option would be to use a ManyToManyField and automatically attach parent Types by overriding the save() method. For example, if 'blog' is selected, a record for 'website' is also created. But this seems overkill to me.

like image 468
Izz ad-Din Ruhulessin Avatar asked Apr 13 '11 17:04

Izz ad-Din Ruhulessin


People also ask

What are the disadvantages of hierarchical database model?

The main disadvantage of the hierarchical model is that it can have one-to-one and one-to-many relationships between the nodes.

What is query optimization in Django?

The main goal of optimization in Django projects is to make it fast to perform database queries by ensuring that your projects run by making the best use of system resources. A properly optimized database will reduce the response time hence a better user experience.

How does Django handle large data?

Use bulk query. Use bulk queries to efficiently query large data sets and reduce the number of database requests. Django ORM can perform several inserts or update operations in a single SQL query. If you're planning on inserting more than 5000 objects, specify batch_size.

When would you use a hierarchical database model?

As the name suggests, the hierarchical database model is most appropriate for use cases in which the main focus of information gathering is based on a concrete hierarchy, such as several individual employees reporting to a single department at a company.


3 Answers

django-mptt or django-treebeard are great helpers for hierarchical data. They both add extra metadata to your model to allow efficient queries.

if you choose to use django-treebeard your model could look something like this:

from django.db import models
from treebeard.mp_tree import MP_Node

class Source(models.Model):
    # Some other fields
    type = models.ForeignKey('Type')

class Type(MP_Node):
    # Some other fields
    name = models.CharField(max_length=100)

    # parent gets added automatically by treebeard
    # parent = models.ForeignKey('self', blank=True, null=True)

and could be queried like this:

# get all Sources of Type type and descendants of type
type = Type.objects.get(name='Radio')
Source.objects.filter(type__in=type.get_descendants())

see https://django-treebeard.readthedocs.io/en/latest/api.html for more possible queries

like image 198
stefanfoulis Avatar answered Oct 26 '22 23:10

stefanfoulis


Such a structure can easily be retrieved using a recursive common table expression.

An example is e.g. here: http://www.postgresql.org/docs/current/static/queries-with.html

like image 23
a_horse_with_no_name Avatar answered Oct 27 '22 01:10

a_horse_with_no_name


How do I efficiently query so that if I select Source by Type, I also retrieve Sources which have a Type that is a child of the given type?

For the example given, it is fairly easy to setup a query because no recursive calls need to be made, and your "hierarchy" is only one level deep:

class Source(models.Model):
   # Some other fields
   type = models.ForeignKey('Type')

class Type(models.Model):
    # Some other fields
    name = models.CharField(max_length=100)
    parent = models.ForeignKey('self', blank=True, null=True)

#We want all sources under in the type = Radio tree
type = Type.objects.get(name='Radio')
qs = Source.objects.filter(type__parent=type)

#We want all sources that are `Forum` typed
type = Type.objects.get(name='Forum')
qs = Source.objects.filter(type=type)

This is assuming that Source is always related to a "child" type and not to the "parent."

If sources can also be related to the "parent" types, you can use Q for complex queries:

>>> from django.db.models import Q
>>> type = Type.objects.get(name='Radio')
>>> qs = Source.objects.filter(Q(type=type)|Q(type_parent=type))
>>> #if Radio type id = 2
>>> print qs.query
SELECT `app_source`.`id`, `app_source`.`type_id` FROM `app_source` INNER JOIN `app_type` ON  (`app_source`.`type_id` = `app_type`.`id`) WHERE (`app_source`.`type_id` = 2  OR `app_type`.`parent_id` = 2 )
>>> 

If you have a truly hierarchical trees in your tables, this method is much less usable, and you should seek out another solution.

like image 41
dting Avatar answered Oct 26 '22 23:10

dting