I'm developing a simple articles website in brazilian portuguese language. The search feature is based on a full-text search, but it isn't returning expected results.
I made this on postgresql. Here is the simplified table:
Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.
After creating the table, I ran:
UPDATE artigos SET artigosts =
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');
CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);
CREATE TRIGGER artigosts_tg
BEFORE INSERT OR UPDATE ON artigos
FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');
Yes, I intend to use simple weightning on the searches. Made a index to speed up, a trigger, so I can insert and update without worry about remaking index and so on.
Well, per my understanding, it is everything fine. But results aren't. A simple example.
Let's say I have "... banco de dados ... no banco ... " as one article content. When I do:
SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;
It returns an empty set. I checked the ts_vector column and saw the predicates 'banc' and 'dad'. But I still can't understand why it does not return the row containing the mentioned article.
Can someone bring a light into this question?
Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.
Introduction. Full-text search (FTS) is a technique used by search engines to find results in a database. It can be used to power search results on websites like shops, search engines, newspapers, and more.
A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such as s or es in English).
The reason for this is probably because your default dictionary is set to english. Try the following queries to determine if that is actually the case.
SELECT * FROM ts_debug('banco de dados');
This query will show how the dictionary parses up your search phrase. It should provide the lexemes "banco", "de", and "dado". So what you're actually searching for will not exist in the index and you will receive 0 results.
Now try this:
SELECT * FROM ts_debug('portuguese', 'banco de dados');
It should return the lexemes that exist in the index, "banc" and "dad". If that is the case then you can simply change your search query to get the appropriate result.
SELECT title, intro, content FROM artigos WHERE plainto_tsquery('portuguese', 'banco de dados') @@ artigosts;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With