Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I configure PostgreSQL programmatically to not eliminate stop words in full-text search?

I'm using PostgreSQL full text search for a project where traditional stop words ('a', 'the', 'if' etc.) should be indexed and searchable, which is not the default behaviour. For example, I might want my users to find results for the query 'to be or not to be'.

The documentation indicates that I could achieve this by creating an empty stopwords dictionary in $SHAREDIR/tsearch_data/english.stop (for example), but this will complicate deployment; I want to be able to configure PostgreSQL's stop word handling with SQL. Is this possible? If so, can you provide a sample SQL statement?

like image 488
tomd Avatar asked Sep 30 '09 12:09

tomd


2 Answers

As per your comment on the previous answer, you can easily switch between using no stop words and all stop words. You can acheive this with a custom search configuration:

(1) You can create a custom dictionary without using the stop words file, for example:

CREATE TEXT SEARCH DICTIONARY english_stem_nostop (     Template = snowball     , Language = english ); 

Note, in the above I left out the StopWords parameter.

(2) Then create a new configuration to use your new dictionary:

CREATE TEXT SEARCH CONFIGURATION public.english_nostop ( COPY = pg_catalog.english ); ALTER TEXT SEARCH CONFIGURATION public.english_nostop    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, hword, hword_part, word WITH english_stem_nostop; 

(3) Then, when searching specify the config you want use (alternatively you can change the default_text_search_config option each time), eg:

SELECT     title FROM     articles WHERE     to_tsvector('english_nostop', COALESCE(title,'') || ' ' || COALESCE(body,''))     @@ to_tsquery('english_nostop', 'how & to'); 

You can specify just 'english' in the above SQL to use the normal config.

Note, in this example that using the standard configuration will result in notices because there are only stop words.


Bear in mind the following, however:

  • If you are using indexes, you will need two - one for each configuration. (see these docs: tsearch tables and triggers).
  • Double check which parser tokens you want to use this mapping as per step #2, above (see Parsers).
like image 97
catchdave Avatar answered Sep 27 '22 15:09

catchdave


No, the stop words are only configurable through that file. You can have a server side function that modifies the file, but that requires the file to be write:able by the postgres service user, which you probably don't want.

like image 27
Magnus Hagander Avatar answered Sep 27 '22 17:09

Magnus Hagander