Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql: how to make full text search ignore certain tokens?

is there a magic function or operator to ignore some tokens?

select to_tsvector('the quick. brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick,brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick.brown fox') @@ 'brown' -- returns false, should return true

select to_tsvector('the quick/brown fox') @@ 'brown' -- returns false, should return true
like image 613
Hao Avatar asked May 22 '09 07:05

Hao


People also ask

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

How do I search for a string in PostgreSQL?

The Good Ol' Text Search. You're probably familiar with pattern search, which has been part of the standard SQL since the beginning, and available to every single SQL-powered database: SELECT column_name FROM table_name WHERE column_name LIKE 'pattern'; That will return the rows where column_name matches the pattern .

What is @@ in PostgreSQL?

Basic Text Matching. Full text searching in PostgreSQL is based on the match operator @@ , which returns true if a tsvector (document) matches a tsquery (query).

What is To_tsvector?

to_tsvector parses a textual document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document. The document is processed according to the specified or default text search configuration.


1 Answers

I'm afraid that you are probably stuck. If you run your terms through ts_debug you will see that 'quick.brown' is parsed as a hostname and 'quick/brown' is parsed as filesystem path. The parser really isn't that clever sadly.

My only suggestion is that you preprocess your texts to convert these tokens to spaces. You could easily create a function in plpgsql to do that.

nicg=# select ts_debug('the quick.brown fox');
                              ts_debug
---------------------------------------------------------------------
 (asciiword,"Word, all ASCII",the,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (host,Host,quick.brown,{simple},simple,{quick.brown})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",fox,{english_stem},english_stem,{fox})
(5 rows)

As you can see from the above you don't get tokens for quick and brown

like image 101
Nic Gibson Avatar answered Oct 01 '22 14:10

Nic Gibson