Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres ts_rank_cd different result for same tsvector?

I'm testing out a text search query on two separate database servers, both running Postgres 9.4.4.

The row in question has the same data, and I get the same underlying tsvector on both servers:

SELECT
  user_id,
  TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
  TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
  TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
FROM users_v1 where user_id = 123;

-- On server A:
-- user_id    | to_tsvector
-- -----------+----------------------
-- 123        | 'georg':1 'hickman':2
-- (1 row)

-- On server B:
-- user_id    | to_tsvector
-- -----------+----------------------
-- 123        | 'georg':1 'hickman':2
-- (1 row)

However I get a different rank when using this vector to run a query:

SELECT username,
 TS_RANK_CD(
   TO_TSVECTOR('english', REGEXP_REPLACE(first_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
   TO_TSVECTOR('english', REGEXP_REPLACE(last_name, '[^a-zA-Z0-9]', ' ', 'g')) ||
   TO_TSVECTOR('english', REGEXP_REPLACE(username, '[^a-zA-Z0-9]', ' ', 'g'))
 , PLAINTO_TSQUERY('george'))
FROM users WHERE user_id = 123;

-- On server A:
-- user_id    | ts_rank_cd
-- -----------+----------------------
-- 123        | 0.2
-- (1 row)

-- On server B:
-- user_id    | ts_rank_cd
-- -----------+----------------------
-- 123        | 0.0
-- (1 row)

Is the vector the only input to the rank function, or are there any server settings/anything else that affect the behaviour of ts_rank_cd? Is all the information stored in the vector displayed in the console output, or is there some hidden difference in it which I'm not seeing? If not what could be causing the discrepancy?

like image 821
actionshrimp Avatar asked Mar 27 '26 12:03

actionshrimp


1 Answers

Thanks to jjanes's comment, I realised that PLAINTO_TSQUERY also accepts an optional argument to specify the text search configuration. Doing

SELECT PLAINTO_TSQUERY('george');

returned george on one system and georg on the other, but doing

SELECT PLAINTO_TSQUERY('english', 'george');

returned the same on both and caused the expected ranking. Doing

SHOW default_text_search_config;

Revealed pg_catalog.english was set as the default on one system and pg_catalog.simple on the other. The discrepancy can then be fixed by either explicitly passing the config in when creating the TSQUERY, or updating the default config so they are the same on both DBs.

like image 130
actionshrimp Avatar answered Mar 30 '26 04:03

actionshrimp