Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should one store a search_data tsvector in the same table or external table?

I am implementing full text search in postgres.

I would like to search all posts in my system. The posts fulltext index is an amalgamation of the post title and post body.

I have two ways of achieving this:

  1. create a tsvector column in the posts table, trigger an update to it.
  2. create a second table (posts_search) with a post_id and tsvector column containing the index data.
  3. create a simple gin index ... (out of the question, cause my real world problem needs data in multiple tables for the index)

What is going to perform better, considering I sometimes need to filter down the search by other attributes in the table (like deleted_at is null and so on).

Is it a better approach to keep the tsvector column in the same table as the data (side effect select * now sucks) or a separate table (side effect, join required, index filtering is complicated)?

like image 760
Sam Saffron Avatar asked Jan 21 '13 00:01

Sam Saffron


1 Answers

In my experiments, typical size of tsvector column is about 1% of the size of text field this tsvector was computed from using to_tsvector().

With this in mind, storing tsvector column in another table should provide performance benefit. For example, even if you do not use SELECT * (and you shouldn't, really), any seqscan in original single table will still have to load pages which contain original text. If you offload tsvector field to separate table, page loading will be faster by 100x.

In other words, I would favor second solution of offloading tsvector field to separate table. Or, alternatively, offloading posts (original text) deeper into your table hierarchy (but I guess it is almost the same thing).

Note that for full text search to work, original text is not necessary. You way want to even not store it in database, or store it in highly compressed format (and not necessarily easily accessible by SQL routines). It would work as long as something can create tsvector based on original text, or update when it changes.

like image 52
mvp Avatar answered Dec 06 '22 15:12

mvp