Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior with tsquery in PostgreSQL with prefix-lexemes

When I use 'a:*' (also 'i:*', 's:*', 't:*')

SELECT id FROM mv_fulltextsearch1 WHERE to_tsvector(text) @@ to_tsquery('a:*') LIMIT 50;

Takes forever and prints the following PostgreSQL output a lot

NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored

But when I use 'b:*' (same with any other single letter in front of ':*')

SELECT id FROM mv_fulltextsearch1 WHERE to_tsvector(text) @@ to_tsquery('b:*') LIMIT 50;

everything is OK

Are a, i, s and t some kind of special characters? How can I escape them / fix the strange behavior?

like image 346
Shinigami Avatar asked Jan 30 '18 14:01

Shinigami


People also ask

What is Lexemes in PostgreSQL?

A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such as s or es in English).

What is Tsquery in PostgreSQL?

2. tsquery. A tsquery value stores lexemes that are to be searched for, and can combine them using the Boolean operators & (AND), | (OR), and ! (NOT), as well as the phrase search operator <-> (FOLLOWED BY).

How to_ tsvector works?

The to_tsvector function internally calls a parser which breaks the document text into tokens and assigns a type to each token. For each token, a list of dictionaries (Section 12.6) is consulted, where the list can vary depending on the token type.

How do I search for text in PostgreSQL?

PostgreSQL has two functions that do exactly what we intend to do: to_tsvector for creating a list of tokens (the tsvector data type, where ts stands for "text search"); to_tsquery for querying the vector for occurrences of certain words or phrases.


1 Answers

use to_tsvector('simple', text) and to_tsquery('simple', 'a:*')

The reason is that the 'english' regconfig removes stop words and "a" is considered a stop word

However, the 'simple' regconfig does not remove stop words

like image 87
Neil McGuigan Avatar answered Oct 05 '22 13:10

Neil McGuigan