Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concat two postgresql tsvector fields originating in separate tables into single postgresql view to enable joined full text search

I have a postgresql view that is comprised as a combination of 3 tables:

create view search_view as 
select u.first_name, u.last_name, a.notes, a.summary, a.search_index 
from user as u, assessor as a, connector as c 
where a.connector_id = c.id and c.user_id = u.id;

However, I need to concat tsvector fields from 2 of the 3 table into a single tsvector field in the view which provides full text search across 4 fields: 2 from one table, and 2 from another.

I've read the documentation stating that I can use the concat operator to combine two tsvector fields, but I'm not certain what this looks like syntactically, and also whether there are potential gotchas with this implementation.

I'm looking for example code that concats two tsvector fields from separate tables into a view, and also commentary on whether this is a good or bad practice in postgresql land.

like image 485
David Watson Avatar asked Mar 25 '14 15:03

David Watson


2 Answers

I was wondering the same thing. I don't think we are supposed to be combining tsvectors from multiple tables like this. Best solution is to:

  1. create a new tsv column in each of your tables (user, assessor, connector)
  2. update the new tsv column in each table with all of the text you want to search. for example in the user table you would update the tsv column of all records concatenating first_name and last_name columns.
  3. create an index on the new tsv column, this will be faster than indexing on the individual columns
  4. Run your queries as usual, and let Postgres do the "thinking" about which indexes to use. It may or may not use all indexes in queries involving more than one table.
  5. use the ANALYZE and EXPLAIN commands to look at how Postgres is utilizing your new indexes for particular queries, and this will give you insight into speeding things up further.

This will be my approach at least. I to have been doing lots of reading and have found that people aren't combining data from multiple tables into tsvectors. In fact I don't think this is possible, it may only be possible to use the columns of the current table when creating a tsvector.

like image 65
FireDragon Avatar answered Oct 21 '22 02:10

FireDragon


Concatenating tsv vectors works but as per comments, index is probably not used this way (not an expert, can't say if it does or does not).

SELECT * FROM newsletters
LEFT JOIN campaigns ON newsletters.campaign_id=campaigns.id
WHERE newsletters.tsv || campaigns.tsv @@ to_tsquery(unaccent(?))

The reason why you'd want this is to search for an AND string like txt1 & txt2 & txt 3 which is very common usage scenario. If you simpy split the search by an OR WHERE campaigns.tsv @@ to_tsquery(unaccent(?) this won't work because it will try to match all 3 tokens in both tsv column but the tokens could be in either column.

One solution which I found is to use triggers to insert and update the tsv column in table1 whenever the table2 changes, see: https://dba.stackexchange.com/questions/154011/postgresql-full-text-search-tsv-column-trigger-with-many-to-many but this is not a definitive answer and using that many triggers is error prone and hacky.

Official documentation and some tutorials also show concatenating all the wanted colums into a ts vector on the fly without using a tsv column. But it is unclear how much slower is the on-the-fly versus tsv column approach, I can't find a single benchmark or explanation about this. The documenntation simply states:

Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches. (This is more important when using a GiST index than a GIN index; see Section 12.9.) The expression-index approach is simpler to set up, however, and it requires less disk space since the tsvector representation is not stored explicitly.

All I can tell from this is that tsv columns are probably waste of resources and just complicate things but it'd be nice to see some hard numbers. But if you can concat tsv columns like this, then I guess it's no different than doing it in a WHERE clause.

like image 27
cen Avatar answered Oct 21 '22 01:10

cen