I'm using Django's inbuilt full text search with PostgreSQL.
The Django docs say that performance can be improved by using a SearchVectorField
. That field keeps a pre-generated ts_vector
column with all the relevant lexemes alongside the model, rather than generating it on the fly during every search.
However, with this approach the ts_vector
must be updated whenever the model is updated. To keep it synchronised, the Django docs suggest using "triggers", and refer us to the PostgreSQL documentation for more details.
However, the PostgreSQL docs themselves say that the trigger approach is now obsolete. Instead of manually updating the ts_vector
column, it is better to keep the column automatically up-to-date by using a stored generated column.
How can I use PostgreSQL's recommended approach with Django?
I worked out how to do this using custom migrations. The main caveat is that you'll need to update these migrations manually whenever the base model (that you're searching) changes.
Note you must be using PostgreSQL 12 for the following to work:
First, create a database column to store the tsvector:
$ python manage.py makemigrations my_app --empty
Migrations for 'my_app':
my_app/migrations/005_auto_20200625_1933.py
Open the new migration file to edit it. We will need to create a column to store the tsvector WITHOUT any associated field in the model definition, so that Django doesn't try to update the auto-generated field itself.
The main downside to this approach is that because this isn't synced to the Django model, if the fields change then a new migration will need to be manually created.
#my_app/migrations/0010_add_tsvector.py
"""
Use setweight() to rank results by weight (where 'A' is highest).
Use PostgreSQL tsvector concatenation operator || to combine multiple
fields from the table. Use `coalesce` ensure that NULL is not
returned if a field is empty.
In this case, `blog_table` is the database table name, and
`textsearch` is the new column, but you can choose anything here
"""
operations = [
migrations.RunSQL(sql="""
ALTER TABLE "blog_content" ADD COLUMN "textsearch" tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(body, '')), 'A') ||
setweight(to_tsvector('english', coalesce(title, '')), 'B') ||
' '
) STORED NULL;
""", reverse_sql="""
ALTER TABLE "blog_content" DROP COLUMN "textsearch";
"""
)
]
To create the new column in the database, run:
$ python manage.py migrate my_app
Then, to use the column in text searches:
#my_app/views.py
from django.db.models.expressions import RawSQL
from django.contrib.postgres.search import SearchVectorField
from django.views.generic.list import ListView
class TextSearchView(ListView):
def get_queryset(self):
'''Return list of top results
Since there is no model field, we must manually retrieve the
column, using `annotate`
'''
query = self.request.GET.get('search_term')
return Blog.objects.annotate(
ts=RawSQL(
'textsearch',
params=[],
output_field=SearchVectorField()
)
).filter(
ts=query
)
Note that the results will already be ranked, because the weights are applied each time the tsvector updates itself.
A django ticket that requests this new feature has been rejected:
... Generated/Function-based virtual columns are a huge topic that would require many changes and have many caveats, e.g. functions must be deterministic. They are feasible also on Oracle. This kind of features require few weeks (even months) of works, a precise plan, and should be preceded by a discussion on DevelopersMailingList and even DEP.
Some workarounds include:
models.Model.save
.forms.ModelForm.save
.I have found an article that showcases some coding practices that will keep you going.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With