Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres: Add full-text search on existing varchar column?

Tags:

postgresql

I have an existing Postgres 9.3 database with a table with a varchar column.

        Table "public.frontend_chemical"
  Column   |          Type          | Modifiers
-----------+------------------------+-----------
 bnf_code  | character varying(9)   | not null
 chem_name | character varying(200) | not null

I would like to run full-text search on the chem_name column.

I have been reading this article, which suggests the steps are as follows:

  1. Add a new tsvector column: ALTER TABLE frontend_chemical ADD COLUMN fts_document tsvector;
  2. Create a function to map the chem_name column to the document, and a trigger to keep it updated.
  3. Create a GIN index on the column: CREATE INDEX chem_fts_index ON frontend_chemical USING gin(fts_document);

Then I should be able to run full-text search queries like: SELECT COUNT(*) FROM frontend_chemical WHERE fts_document @@ 'statin';.

Firstly, is that general process correct?

Secondly, how do I map all the existing entries in the chem_name column to the fts_document column? The example in the article seems to only update the document column when the chem_name column is updated, whereas I have a large existing database.

like image 714
Richard Avatar asked Apr 15 '15 12:04

Richard


People also ask

How do I use full text search in PostgreSQL?

In PostgreSQL, you use two functions to perform Full Text Search. They are to_tsvector() and to_tsquery(). Let's see how they work and to use them first. to_tsvector() function breaks up the input string and creates tokens out of it, which is then used to perform Full Text Search using the to_tsquery() function.

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

How do I do a full text search?

Go to any cluster and select the “Search” tab to do so. From there, you can click on “Create Search Index” to launch the process. Once the index is created, you can use the $search operator to perform full-text searches.

How do I search for a string in PostgreSQL?

The Good Ol' Text Search. You're probably familiar with pattern search, which has been part of the standard SQL since the beginning, and available to every single SQL-powered database: SELECT column_name FROM table_name WHERE column_name LIKE 'pattern'; That will return the rows where column_name matches the pattern .


1 Answers

This process is correct but maybe overkill in your case.

As a single column needs to be full-text searched, you may do away with the dedicated tsvector column, and create only the GIN index as:

CREATE INDEX chem_fts_index ON frontend_chemical
    USING gin(to_tsvector('simple',chem_name));

Instead of simple, you may specify english or another available configuration if linguistic rules are needed.

Then you'll benefit from the index when searching with:

select columns from frontend_chemical where
   to_tsvector('simple', chem_name) @@ to_tsquery('simple','expression to search');

The key point being that the tsvector expression is exactly the same as in the GIN index.

This has the advantages of not requiring a trigger, of saving the space of the dedicated column whose values are already in the index anyway, and of not requiring to initialize that column (your 2nd question).


Should you want that column anyway, it should be initially populated with an update query of this form:

UPDATE frontend_chemical SET fts_document = to_tsvector('simple', chem_name);

(again, assuming simple as the text search config)


EDIT following comments:

to_tsquery() with only one argument uses the default text configuration (otherwise the configuration name should be passed as the first argument). If this default does not match the one used in to_tsvector, that's a problem. The default can be changed in several ways:

  • for the duration of the session (not persistent)

        SET default_text_search_config to 'simple';
    
  • for the database (persistent)

        ALTER DATABASE nameofdb SET default_text_search_config to 'simple';
    
  • otherwise, always use the two-arguments form for to_tsquery with the explicit text configuration name as the first argument (I've changed the example above to use that form).

To search for a prefix as you seem to want with Ro, you may use this condition:

to_tsvector('simple', chem_name) @@ to_tsquery('simple', 'Ro:*')

See Controlling Text Search in the manual for more.

like image 76
Daniel Vérité Avatar answered Sep 30 '22 05:09

Daniel Vérité