Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Levenshtein function on each element in a tsvector?

I'm trying to create a fuzzy search using Postgres and have been using django-watson as a base search engine to work off of.

I have a field called search_tsv that its a tsvector containing all the field values of the model that I want to search on.

I was wanting to use the Levenshtein function, which does exactly what I want on a text field. However, I dont really know how to run it on each individual element of the tsvector.

Is there a way to do this?

like image 383
Stephen Baden Avatar asked Jan 16 '23 18:01

Stephen Baden


1 Answers

Consider the extension pg_trgm instead of levenshtein(). It is faster by orders of magnitude when backed with a GiST index to support the KNN feature in PostgreSQL 9.1 or later.

Install the extension once per database:

CREATE EXTENSION pg_trgm;

And use the <-> or % operator. Several related answers have been posted here on SO, search for pg_tgrm [PostgreSQL] ...


Wild shot at what you may want:

WITH x AS (
    SELECT unnest(string_to_array(trim(strip(
      'fat:2,4 cat:3 rat:5A'::tsvector)::text, ''''), ''' ''')) AS val
    )                                    -- provide ts_vector, extract strings
    , y AS( SELECT 'brat'::text AS term) -- provide term to match
SELECT val, term
      ,(val <-> term) AS trg_dist        -- distance operator
      ,levenshtein(val, term) AS lev_dist
FROM   x, y;

Returns:

 val | term | trg_dist | lev_dist
-----+------+----------+----------
 cat | brat |    0.875 |        2
 fat | brat |    0.875 |        2
 rat | brat | 0.714286 |        1
like image 125
Erwin Brandstetter Avatar answered Jan 21 '23 18:01

Erwin Brandstetter