Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Django: Join on the same table

I'm trying to use ltree extension in PostgreSQL to build a full-text address search engine.

My model looks like this (it's slightly simplified):

from django.db import models


class Addresses(models.Model):
    name = models.CharField(max_length=255)
    path = models.CharField(max_length=255)

So, data in this table will look like this:

id  |   name       |  path
----------------------------
 1  |  USA         | 1
 2  |  California  | 1.2
 3  |  Los Angeles | 1.2.3

I want to do a full-text search on the aggregated name of each entity. Basically I need to convert each row in table to the next format to do a search:

    id  |           full_name            |  path
-------------------------------------------------
  1     |  USA                           |   1
  2     |  California USA                |   1.2
  3     |  Los Angeles California USA    |   1.2.3

I'm doing that in such way, so user can perform queries like 'los ang cali' or similar. I have no problems to do that with raw PostgreSQL query:

SELECT *, ts_rank_cd(to_tsvector('english', full_address), query) AS rank 
FROM (SELECT s.id, s.path, array_to_string(array_agg(a.name ORDER BY a.path DESC), ' ') AS full_address
        FROM "Addresses" AS s INNER JOIN "Addresses" AS a
        ON (a.path @> s.path) GROUP BY s.id, s.path, s.name
) AS subquery, to_tsquery('english', %s) as query WHERE   to_tsvector('english', full_address) @@ query
ORDER BY rank DESC;

That works fine, but while using RawQuerySet, I can't use things like .filter(), .group_by(), pagination, etc.

The main constraint to reproduce it in Django is this JOIN:

JOIN "Addresses" AS a ON (a.path @> s.path)

it's used to join all ancestors of each element and then aggregate them using array_agg(), array_to_string functions, so the output of these functions can be used further in full-text search.

If anyone have better ideas how to implement such kind of thing using Django ORM, please advise.

like image 659
Vit D Avatar asked Dec 30 '15 16:12

Vit D


2 Answers

Summary

You need an unmanaged model backed by a VIEW.

An unmanaged model.

Creating an unmanaged model is achieved by setting the managed meta option of a model to false.

If False, no database table creation or deletion operations will be performed for this model. This is useful if the model represents an existing table or a database view that has been created by some other means. This is the only difference when managed=False. All other aspects of model handling are exactly the same as normal. This includes

Emphasis mine.

Thus if you create an unmanaged model it can be represented by a view on the database and you have access to .filter(), .group_by() on it.

The View.

The view is your query.

CREATE OR REPLACE view full_address_tree AS

SELECT a.*, s.id, s.path, array_to_string(array_agg(a.name ORDER BY a.path DESC), ' ') AS full_address
        FROM "Addresses" AS s INNER JOIN "Addresses" AS a
        ON (a.path @> s.path) GROUP BY s.id, s.path, s.name

Creating the model

class FullAddressTree(models.Model):
    # copy paste the fields from your Addresses model here
    sid = models.IntegerField()
    sid = models.CharField()

    class Meta:
        # this is the most important part
        managed = False
        db_table = 'full_address_tree' # the name of the view

Thus now you have a model which can be used to do full text searches without having to resort to raw queries. Thus you have the full power of the Django ORM at your disposal.

Migrations.

If you want a migration, you will find that ./manage.py makemigrations results in a dummy migration. ./manage.py sqlmigrate will reveal that no sql queries are executed for this migration.

To fix it and to have the view created automatically add a RunSQL call to the operations list in that migration.

migrations.RunSQL(''' COPY PASTE SQL QUERY FROM ABOVE ''')

Caveats

The unmanaged model you have created is read only. Attempting to Create, Replace, Update or Delete will fail. If you need this functionality you will need an INSTEAD trigger.

like image 81
e4c5 Avatar answered Oct 24 '22 13:10

e4c5


So big +1 @shang-wang for their suggestion on django-mptt. Using that gets around your problem because all the tree operations in MPTT work as a regular QuerySet and thus are chainable to annotate and aggregate. The only thing I'm not sure of is if your problem is insert heavy. If you're just planning on dumping a lot of data into the table once, then no big deal. If you're going to be modifying the tree often then it might be a bit more of a problem. For good description of what MPTT is and how it works http://www.sitepoint.com/hierarchical-data-database-2/

Anyhow, your original problem of getting all the ancestors of a node then becomes la_node.get_ancestors(). That gets around the join constraint you mentioned which should make it possible to reformulate the remainder of the query.

like image 45
Paul Avatar answered Oct 24 '22 13:10

Paul