I'm trying to create a fuzzy search using Postgres and have been using django-watson as a base search engine to work off of.
I have a field called search_tsv that its a tsvector containing all the field values of the model that I want to search on.
I was wanting to use the Levenshtein function, which does exactly what I want on a text field. However, I dont really know how to run it on each individual element of the tsvector.
Is there a way to do this?
Consider the extension pg_trgm
instead of levenshtein()
. It is faster by orders of magnitude when backed with a GiST index to support the KNN feature in PostgreSQL 9.1 or later.
Install the extension once per database:
CREATE EXTENSION pg_trgm;
And use the <->
or %
operator. Several related answers have been posted here on SO, search for pg_tgrm [PostgreSQL] ...
Wild shot at what you may want:
WITH x AS (
SELECT unnest(string_to_array(trim(strip(
'fat:2,4 cat:3 rat:5A'::tsvector)::text, ''''), ''' ''')) AS val
) -- provide ts_vector, extract strings
, y AS( SELECT 'brat'::text AS term) -- provide term to match
SELECT val, term
,(val <-> term) AS trg_dist -- distance operator
,levenshtein(val, term) AS lev_dist
FROM x, y;
Returns:
val | term | trg_dist | lev_dist
-----+------+----------+----------
cat | brat | 0.875 | 2
fat | brat | 0.875 | 2
rat | brat | 0.714286 | 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With