Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do an accent-insensitive TrigramSimilarity search in django?

How can I add accent-insensitive search to following snippet from the django docs:

>>> from django.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name='Katy Stevens')
>>> Author.objects.create(name='Stephen Keats')
>>> test = 'Katie Stephens'
>>> Author.objects.annotate(
...     similarity=TrigramSimilarity('name', test),
... ).filter(similarity__gt=0.3).order_by('-similarity')
[<Author: Katy Stevens>, <Author: Stephen Keats>]

How could this match test = 'Kâtié Stéphèns'?

like image 825
Private Avatar asked Sep 04 '25 04:09

Private


1 Answers

There exist the unaccent lookup:

The unaccent lookup allows you to perform accent-insensitive lookups using a dedicated PostgreSQL extension.

Also if you take a look at the aggregation part of django docs, you can read the following:

When specifying the field to be aggregated in an aggregate function, Django will allow you to use the same double underscore notation that is used when referring to related fields in filters. Django will then handle any table joins that are required to retrieve and aggregate the related value.


Derived from the above:

You can use the trigram_similar lookup, combined with unaccent, then annotate on the result:

Author.objects.filter(
    name__unaccent__trigram_similar=test
).annotate(
    similarity=TrigramSimilarity('name__unaccent', test),
).filter(similarity__gt=0.3).order_by('-similarity')

OR

if you want to keep it as close as possible to the original sample (and omit one potentially slow filtering followed by another):

Author.objects.annotate(
    similarity=TrigramSimilarity('name__unaccent', test),
).filter(similarity__gt=0.3).order_by('-similarity')

Those will only work in Django version >= 1.10


EDIT:

Although the above should work, @Private reports this error occurred:

Cannot resolve keyword 'unaccent' into a field. Join on 'unaccented' not permitted.

This may be a bug, or unaccent is not intended to work that way. The following code works without the error:

Author.objects.filter(
    name__unaccent__trigram_similar=test
).annotate(
    similarity=TrigramSimilarity('name', test),
).filter(similarity__gt=0.3).order_by('-similarity')
like image 159
John Moutafis Avatar answered Sep 06 '25 17:09

John Moutafis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!