I'm trying to use ltree
extension in PostgreSQL to build a full-text address search engine.
My model looks like this (it's slightly simplified):
from django.db import models
class Addresses(models.Model):
name = models.CharField(max_length=255)
path = models.CharField(max_length=255)
So, data in this table will look like this:
id | name | path
----------------------------
1 | USA | 1
2 | California | 1.2
3 | Los Angeles | 1.2.3
I want to do a full-text search on the aggregated name of each entity. Basically I need to convert each row in table to the next format to do a search:
id | full_name | path
-------------------------------------------------
1 | USA | 1
2 | California USA | 1.2
3 | Los Angeles California USA | 1.2.3
I'm doing that in such way, so user can perform queries like 'los ang cali' or similar. I have no problems to do that with raw PostgreSQL query:
SELECT *, ts_rank_cd(to_tsvector('english', full_address), query) AS rank
FROM (SELECT s.id, s.path, array_to_string(array_agg(a.name ORDER BY a.path DESC), ' ') AS full_address
FROM "Addresses" AS s INNER JOIN "Addresses" AS a
ON (a.path @> s.path) GROUP BY s.id, s.path, s.name
) AS subquery, to_tsquery('english', %s) as query WHERE to_tsvector('english', full_address) @@ query
ORDER BY rank DESC;
That works fine, but while using RawQuerySet, I can't use things like .filter()
, .group_by()
, pagination, etc.
The main constraint to reproduce it in Django is this JOIN:
JOIN "Addresses" AS a ON (a.path @> s.path)
it's used to join all ancestors of each element and then aggregate them using array_agg()
, array_to_string
functions, so the output of these functions can be used further in full-text search.
If anyone have better ideas how to implement such kind of thing using Django ORM, please advise.
You need an unmanaged model backed by a VIEW.
Creating an unmanaged model is achieved by setting the managed meta option of a model to false.
If False, no database table creation or deletion operations will be performed for this model. This is useful if the model represents an existing table or a database view that has been created by some other means. This is the only difference when managed=False. All other aspects of model handling are exactly the same as normal. This includes
Emphasis mine.
Thus if you create an unmanaged model it can be represented by a view on the database and you have access to .filter()
, .group_by()
on it.
The view is your query.
CREATE OR REPLACE view full_address_tree AS
SELECT a.*, s.id, s.path, array_to_string(array_agg(a.name ORDER BY a.path DESC), ' ') AS full_address
FROM "Addresses" AS s INNER JOIN "Addresses" AS a
ON (a.path @> s.path) GROUP BY s.id, s.path, s.name
class FullAddressTree(models.Model):
# copy paste the fields from your Addresses model here
sid = models.IntegerField()
sid = models.CharField()
class Meta:
# this is the most important part
managed = False
db_table = 'full_address_tree' # the name of the view
Thus now you have a model which can be used to do full text searches without having to resort to raw queries. Thus you have the full power of the Django ORM at your disposal.
If you want a migration, you will find that ./manage.py makemigrations results in a dummy migration. ./manage.py sqlmigrate will reveal that no sql queries are executed for this migration.
To fix it and to have the view created automatically add a RunSQL call to the operations
list in that migration.
migrations.RunSQL(''' COPY PASTE SQL QUERY FROM ABOVE ''')
The unmanaged model you have created is read only. Attempting to Create, Replace, Update or Delete will fail. If you need this functionality you will need an INSTEAD trigger.
So big +1 @shang-wang for their suggestion on django-mptt. Using that gets around your problem because all the tree operations in MPTT work as a regular QuerySet
and thus are chainable to annotate
and aggregate
. The only thing I'm not sure of is if your problem is insert heavy. If you're just planning on dumping a lot of data into the table once, then no big deal. If you're going to be modifying the tree often then it might be a bit more of a problem. For good description of what MPTT is and how it works http://www.sitepoint.com/hierarchical-data-database-2/
Anyhow, your original problem of getting all the ancestors of a node then becomes
la_node.get_ancestors()
. That gets around the join constraint you mentioned which should make it possible to reformulate the remainder of the query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With