Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Full Text SearchVectorField obsolete in PostgreSQL

I'm using Django's inbuilt full text search with PostgreSQL.

The Django docs say that performance can be improved by using a SearchVectorField. That field keeps a pre-generated ts_vector column with all the relevant lexemes alongside the model, rather than generating it on the fly during every search.

However, with this approach the ts_vector must be updated whenever the model is updated. To keep it synchronised, the Django docs suggest using "triggers", and refer us to the PostgreSQL documentation for more details.

However, the PostgreSQL docs themselves say that the trigger approach is now obsolete. Instead of manually updating the ts_vector column, it is better to keep the column automatically up-to-date by using a stored generated column.

How can I use PostgreSQL's recommended approach with Django?

like image 460
gatlanticus Avatar asked Jan 10 '20 04:01

gatlanticus


2 Answers

I worked out how to do this using custom migrations. The main caveat is that you'll need to update these migrations manually whenever the base model (that you're searching) changes.

Note you must be using PostgreSQL 12 for the following to work:


First, create a database column to store the tsvector:

$ python manage.py makemigrations my_app --empty

Migrations for 'my_app':
  my_app/migrations/005_auto_20200625_1933.py

Open the new migration file to edit it. We will need to create a column to store the tsvector WITHOUT any associated field in the model definition, so that Django doesn't try to update the auto-generated field itself.

The main downside to this approach is that because this isn't synced to the Django model, if the fields change then a new migration will need to be manually created.

#my_app/migrations/0010_add_tsvector.py

"""
Use setweight() to rank results by weight (where 'A' is highest).
Use PostgreSQL tsvector concatenation operator || to combine multiple
fields from the table. Use `coalesce` ensure that NULL is not
returned if a field is empty.

In this case, `blog_table` is the database table name, and
`textsearch` is the new column, but you can choose anything here
"""

operations = [
    migrations.RunSQL(sql="""
        ALTER TABLE "blog_content" ADD COLUMN "textsearch" tsvector
        GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(body, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(title, '')), 'B') ||
        ' '
        ) STORED NULL;
    """, reverse_sql="""
        ALTER TABLE "blog_content" DROP COLUMN "textsearch";
    """
    )
]

To create the new column in the database, run:

$ python manage.py migrate my_app

Then, to use the column in text searches:

#my_app/views.py

from django.db.models.expressions import RawSQL
from django.contrib.postgres.search import SearchVectorField
from django.views.generic.list import ListView


class TextSearchView(ListView):
    def get_queryset(self):
        '''Return list of top results
        
        Since there is no model field, we must manually retrieve the
        column, using `annotate`
        '''
        query = self.request.GET.get('search_term')

        return Blog.objects.annotate(
                ts=RawSQL(
                    'textsearch',
                    params=[],
                    output_field=SearchVectorField()
                )
            ).filter(
                ts=query
            )

Note that the results will already be ranked, because the weights are applied each time the tsvector updates itself.

like image 106
gatlanticus Avatar answered Sep 19 '22 02:09

gatlanticus


A django ticket that requests this new feature has been rejected:

... Generated/Function-based virtual columns are a huge topic that would require many changes and have many caveats, e.g. functions must be deterministic. They are feasible also on Oracle. This kind of features require few weeks (even months) of works, a precise plan, and should be preceded by a discussion on DevelopersMailingList and even DEP.

Some workarounds include:

  • Populating the contents of the field in models.Model.save.
  • Populating the contents of the field in forms.ModelForm.save.
  • Using a post-save signal.

I have found an article that showcases some coding practices that will keep you going.

like image 29
raratiru Avatar answered Sep 21 '22 02:09

raratiru