Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine trigram with ranked searching in django 1.10

We working with searching in django 1.10 and we need user ranked searching with trigram searching.

Our code is this:

def get_queryset(self):
        search = self.request.GET.get('text', '')
        vector = SearchVector('name',weight='A',
            config=settings.SEARCH_LANGS[
                settings.LANGUAGE
            ],
            ) + SearchVector(
            'content',
            weight='B',
            config=settings.SEARCH_LANGS[
                settings.LANGUAGE
            ],
            )
        query = SearchQuery(search)
        return Article.objects.annotate(
            rank=SearchRank(
                vector,
                query
                ),
            similarity=TrigramSimilarity(
                'name', search
                ) + TrigramSimilarity(
                'content', search
                ),
            ).filter(
            rank__gte=0.3
            ).filter(
            similarity__gt=0.3
            ).order_by(
            '-similarity'
            )[:20]

But this code doesn't return any query, without use trigram we haven problems, but, combined between they we can't get a query.

How can we combine trigram and ranked searching in django 1.10?

like image 950
SalahAdDin Avatar asked Jun 16 '16 12:06

SalahAdDin


1 Answers

We investigated more thoroughly understood how search works weights.

According to documents you can be assigned weights according to the fields and they can even be assigned weights, and similarly we can use trigrams to filter by similarity or distance.

However not specify an example of using the two and investigating further it understood nor much as weights work.

A little logic tells us that if we seek a common word in all we will all ranks 0, similarity varies much more than ranges, however tends to lower values ​​that range.

Now, text search, as far as we understand, it is carried out based on the text contained in the fields you want to filter even more than in the language that is placed in the configuration. Example is that putting titles, the used model had a title field and a content field, whose most common words were how change, reviewing weighted words (ranges function as query, so we can use values ​​or values_list to review the ranks and similarities, which are numerical values, we can view weighted words viewing vector object), we saw that if weights were allocated, but combinations of splitted words: found 'perfil' and 'cambi', however we did not find 'cambiar' or 'como'; however, all models had contained the same text as 'lorem ipsun ...', and all the words of that sentence if they were whole and with weights B; We conclude with this that the searches are done based on the contents of the fields to filter more than the language with which we configure searches.

That said, here we present the code we use for everything.

First, we need to use Trigrams the extent necessary to enable the database:

from django.db import migrations
from django.contrib.postgres.operations import UnaccentExtension, TrigramExtension

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
      ...
      TrigramExtension(),
      UnaccentExtension(),

    ]

Import operations for migration from postgres packages and run from any file migration .

The next step is to change the code of the question so that the filter returns one of the querys if the second fails:

def get_queryset(self):
        search_query = SearchQuery(self.request.GET.get('q', ''))

        vector = SearchVector(
            'name',
            weight='A',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        ) + SearchVector(
            'content',
            weight='B',
            config=settings.SEARCH_LANGS[settings.LANGUAGE_CODE],
        )

        if self.request.user.is_authenticated:
            queryset = Article.actives.all()
        else:
            queryset = Article.publics.all()

        return queryset.annotate(
          rank=SearchRank(vector, search_query)
          similarity=TrigramSimilarity(
              'name', search_query
            ) + TrigramSimilarity(
              'content', search_query
            ),
        ).filter(Q(rank__gte=0.3) | Q(similarity__gt=0.3)).order_by('-rank')[:20]

The problem with the above code was seeping one query after another, and if the word chosen not appear in any of the two searches the problem is greater . We use a Q object to filter using an OR connector so that if one of the two does not return a desired value , send the other in place.

With this is enough, however they are welcome clarifications depth on how these weights and trigramas work, to explitar the most of this new advantage offered by the latest version of Django.

like image 59
SalahAdDin Avatar answered Oct 23 '22 00:10

SalahAdDin