Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django psql full text search not matching un-stemmed word

I'm running Django 1.10.1 against Postgres 9.4. My staging server and dev environments have psql servers at version 9.4.9 and production is an RDS instance at 9.4.7.

It seems like my SearchVectorField is not storing the search configuration given in production, though it is in staging and dev, and it seems to be either a version thing (unlikely, given the version difference and that it also worked on 9.3 in staging/dev) or the fact that production is on RDS instead of local on the server.

I'm using a custom configuration for full-text search called unaccent, which looks like this:

      Token      |     Dictionaries      
-----------------+-----------------------
 asciihword      | english_stem
 asciiword       | english_stem
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | unaccent,english_stem
 hword_asciipart | english_stem
 hword_numpart   | simple
 hword_part      | unaccent,english_stem
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | unaccent,english_stem

Unaccent is installed in both environments, and works in both environments.

I'm storing the search data in a django.contrib.postgres.search.SearchVectorField on my Writer model:

class Writer(models.Model):
    #...
    search = SearchVectorField(blank=True)

That column is updated with the following search vector:

writer_search_vector = (SearchVector('first_name', 'last_name', 'display_name',
                                     config='unaccent', weight='A') +
                        SearchVector('raw_search_data', config='unaccent', weight='B'))

by the following statement, which runs periodically:

Writer.objects.update(search=search_utils.writer_search_vector)

And, for some reason, the configuration is storing successfully on my staging server and in dev, but not in production. E.g., this code returns the same results in all environments:

In [3]: Writer.objects.annotate(searchy=SearchVector('last_name')).filter(searchy='kostenberger')
Out[3]: <QuerySet []>
In [4]: Writer.objects.annotate(searchy=SearchVector('last_name', config='unaccent')).filter(searchy='kostenberger')
Out[4]: <QuerySet [<Writer: Andreas J. Köstenberger>, <Writer: Margaret Elizabeth Köstenberger>]>

But in staging, I get the following correct result if I use the stored vector:

In [5]: Writer.objects.filter(search='kostenberger')
Out[5]: <QuerySet [<Writer: Andreas J. Köstenberger>, <Writer: Margaret Elizabeth Köstenberger>]>

while in production, against the RDS instance, I get the following, incorrect result:

In [5]: Writer.objects.filter(search='kostenberger')
Out[5]: <QuerySet []>

and yet, in production still, the unaccent works but the english_stem does not, in that it will match the stemmed version of the text (below), but not the original version (above):

In [6]: Writer.objects.filter(search='kostenberg')
Out[6]: <QuerySet [<Writer: Margaret Elizabeth Köstenberger>, <Writer: Andreas J. Köstenberger>]>

Note that the database tables for Writer in the two environments are identical for this test.

Any ideas why the stored vector isn't working in production with the correct config, while if I create the vector on the fly it will work?

like image 486
ryanmrubin Avatar asked Oct 30 '22 20:10

ryanmrubin


1 Answers

On RDS Postgres, you aren't allowed to change the default_text_search_config parameter. So, you have to configure the text search with each query:

from django.contrib.postgres.search import SearchRank, SearchQuery
…
search_query = SearchQuery(value='kostenberger', config='unaccent')
Writer.objects.filter(search=search_query)
like image 60
David Eyk Avatar answered Nov 15 '22 08:11

David Eyk