Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this PostgreSQL full-text search break on a different (RDS) server?

On several development servers, this query has returned expected records:

SELECT name, name_tsv FROM vision 
WHERE name_tsv @@ plainto_tsquery('Washington Square Park');

name_tsv was originally populated on my dev server with

UPDATE vision SET name_tsv=to_tsvector(name);

and is kept up to date with a trigger.

I have created the same database on an AWS PostgreSQL RDS instance. Both dev and RDS postgres versions are at 9.3.1. As far as I can tell, pg_catalog on each has the same FTS Configurations, Dictionaries, Parsers, and Templates (default; I haven't messed with anything in here). I cannot access the pg conf file on the RDS instance, of course. On this RDS instance, the above query returns 0 records.

I ran this diagnostic query on both:

SELECT name, 
name_tsv, 
to_tsvector(name), 
plainto_tsquery('Washington Square Park'), 
name_tsv @@ plainto_tsquery('Washington Square Park') AS matches_stored_name,
to_tsvector(name) @@ plainto_tsquery('Washington Square Park') AS matches_fresh_tsvector
FROM vision WHERE id_vision = 2977;

The result on the RDS instance is:

"1609: Washington Square Park";"'1609':1 'park':4 'squar':3 'washington':2";"'1609':1 'park':4 'square':3 'washington':2";"'washington' & 'square' & 'park'";f;t

The result on the dev instance is:

"1609: Washington Square Park";"'1609':1 'park':4 'squar':3 'washington':2";"'1609':1 'park':4 'squar':3 'washington':2";"'washington' & 'squar' & 'park'";t;t

It appears from the above that on RDS, to_tsvector() and plainto_tsquery() both seem not to produce the truncated lexeme 'squar' that they do on the dev server (the same no-lexeme pattern happens with other strings). However, I tried running

UPDATE vision SET name_tsv=to_tsvector(name);

on the RDS server, but name_tsv did not change (still = "'1609':1 'park':4 'squar':3 'washington':2").

What can I do on the new RDS server to make the first query return expected records the same way it does on my dev server? I think I just need to make plainto_tsquery('Washington Square Park') normalize tokens to lexemes (e.g. return 'squar' not 'square'), but I can't tell from this how to do that.

like image 798
Kim Avatar asked Jan 19 '14 23:01

Kim


1 Answers

This query:

SELECT name, name_tsv FROM vision 
WHERE name_tsv @@ plainto_tsquery('Washington Square Park');

uses the default dictionary. I'd say the default dictionary is different between the two machines. See:

regress=> SELECT plainto_tsquery('Washington Square Park');
         plainto_tsquery         
---------------------------------
 'washington' & 'squar' & 'park'
(1 row)

regress=> SELECT plainto_tsquery('english', 'Washington Square Park');
         plainto_tsquery         
---------------------------------
 'washington' & 'squar' & 'park'
(1 row)

regress=> SELECT plainto_tsquery('simple', 'Washington Square Park');
         plainto_tsquery          
----------------------------------
 'washington' & 'square' & 'park'
(1 row)

Compare the result of running:

SHOW default_text_search_config ;

on both machines. Betcha they're different.

like image 124
Craig Ringer Avatar answered Nov 14 '22 14:11

Craig Ringer