Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a View of multiple tables be used for Full-Text-Search?

I'm sorry to ask such a noob question, but the postgres documentation on views is sparse, and I had trouble finding a good answer.

I'm trying to implement Full-Text-Search on Postgres for three tables. Specifically, the user's search query would return matching 1) other usernames, 2) message, 3) topics.

I'm concerned that using a view for this might not scale well as it combines three tables into one. Is this a legitimate concern? If not, how else might I approach this?

like image 332
dmr07 Avatar asked Jan 03 '23 17:01

dmr07


1 Answers

What you request can be done. To have a practical example (with just two tables), you could have:

CREATE TABLE users
(
    user_id SERIAL PRIMARY KEY,
    username text
) ;

-- Index to find usernames
CREATE INDEX idx_users_username_full_text 
    ON users 
    USING GIN (to_tsvector('english', username)) ;        

CREATE TABLE topics
(
    topic_id SERIAL PRIMARY KEY,
    topic text
) ;

-- Index to find topics
CREATE INDEX idx_topics_topic_full_text 
    ON topics 
    USING GIN (to_tsvector('english', topic)) ;

See PostgreSQL docs. on Controlling Text Search for an explanation of to_tsvector.

... populate the tables

INSERT INTO users
   (username)
VALUES
   ('Alice Cooper'),
   ('Boo Geldorf'),
   ('Carol Burnet'),
   ('Daniel Dafoe') ;

INSERT INTO topics
   (topic)
VALUES
   ('Full text search'),
   ('Fear of void'),
   ('Alice in Wonderland essays') ;

... create a view that combines values from both tables

CREATE VIEW search_items AS
SELECT 
    text 'users' AS origin_table, user_id AS id, to_tsvector('english', username) AS searchable_element
FROM
    users
UNION ALL
SELECT 
    text 'topics' AS origin_table, topic_id AS id, to_tsvector('english', topic) AS searchable_element 
FROM
    topics ;

We search that view:

SELECT 
    *
FROM
    search_items
WHERE
    plainto_tsquery('english', 'alice') @@ searchable_element

... and get the following response (you should mostly ignore the searchable_element). You're mostly interested in the origin_table and id.

origin_table | id | searchable_element               
:----------- | -: | :--------------------------------
users        |  1 | 'alic':1 'cooper':2              
topics       |  3 | 'alic':1 'essay':4 'wonderland':3

See Parsing Queries for an explanation of plainto_tsquery function, and also @@ operator.


To make sure indexes are used:

EXPLAIN ANALYZE
SELECT 
    *
FROM
    search_items
WHERE
    plainto_tsquery('english', 'alice') @@ searchable_element
| QUERY PLAN                                                                                                                                 |
| :----------------------------------------------------------------------------------------------------------------------------------------- |
| Append  (cost=12.05..49.04 rows=12 width=68) (actual time=0.017..0.031 rows=2 loops=1)                                                     |
|   ->  Bitmap Heap Scan on users  (cost=12.05..24.52 rows=6 width=68) (actual time=0.017..0.018 rows=1 loops=1)                             |
|         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                                 |
|         Heap Blocks: exact=1                                                                                                               |
|         ->  Bitmap Index Scan on idx_users_username_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.005..0.005 rows=1 loops=1) |
|               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, username))                                             |
|   ->  Bitmap Heap Scan on topics  (cost=12.05..24.52 rows=6 width=68) (actual time=0.012..0.012 rows=1 loops=1)                            |
|         Recheck Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                    |
|         Heap Blocks: exact=1                                                                                                               |
|         ->  Bitmap Index Scan on idx_topics_topic_full_text  (cost=0.00..12.05 rows=6 width=0) (actual time=0.002..0.002 rows=1 loops=1)   |
|               Index Cond: ('''alic'''::tsquery @@ to_tsvector('english'::regconfig, topic))                                                |
| Planning time: 0.098 ms                                                                                                                    |
| Execution time: 0.055 ms                                                                                                                   |

Indexes are really used (see Bitmap Index Scan on idx_topics_topic_full_text and Bitmap Index Scan on idx_users_username_full_text).

You can check everything at dbfiddle here


NOTE: 'english' is the text search configuration chosen to index and query. Choose the proper one for your case. You can create your own if the existing ones don't fill your needs.

like image 191
joanolo Avatar answered Jan 13 '23 10:01

joanolo