Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres stemming throwing out matches

The query

SELECT to_tsvector('recreation') @@ to_tsquery('recreatio:*');

returns false even though 'recreati' is a prefix of 'recreation'. This seems to happen because 'recreation' is stored as its stem, 'recreat'. For example, if we deliberately break the stemming algorithm by running

SELECT to_tsvector('recreation1') @@ to_tsquery('recreatio:*');

the query returns true.

Is there a way to make the first query match?

like image 933
xuanji Avatar asked Aug 13 '15 22:08

xuanji


1 Answers

Not sure if this answer is useful given the age of the question, but:

Concerning stemming

It seems you are right:

select ts_lexize('english_stem','recreation');

outputs

 ts_lexize
-----------
 {recreat}
(1 row)

and the documentation says

Also, * can be attached to a lexeme to specify prefix matching:

SELECT to_tsquery('supern:*A & star:A*B');

Such a lexeme will match any word in a tsvector that begins with the given string.

So it seems there is no way to make original query match.

A solution based on partial matching

One could fallback to looking for partial matches of the stems and the query, e.g. using pg_trgm extension:

SELECT (to_tsvector('recreation creation') @@ to_tsquery('recreatio:*')) or 
  'recreatio:*' % any (
    select trim(both '''' from regexp_split_to_table(strip(to_tsvector('recreation creation'))::text, ' '))
  );

(Maybe the array of stems can be formed in a more elegant way.)

like image 188
overdawn Avatar answered Nov 01 '22 20:11

overdawn