I want to do a text search like google suggestions.
I'm using PostgreSQL because of the magical Postgis.
I was thinking on using FTS, but I saw that it could not search partial words, so I found this question, and saw how trigrams works.
The main problem is that the search engine I'm working on is for spanish language. FTS worked great with stemming and dictionaries (synonyms, misspells), UTF and so on. Trigrams worked great for partial words, but they only work for ASCII, and (obviously) they don't use things like dictionaries.
I was thinking if is there any way in which the best things from both could be used.
Is it possible make Full Text Search and Trigrams to work together in PGSQL?
Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.
A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.
A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).
You can do this in Postgres, and don't need Lucene.
You can quote phrases in tsquery
or tsvector
like the below. You can add a :*
after a tsquery
term to do a prefix search:
select
'''new york city'''::tsvector @@ '''new yo'':*'::tsquery, --true
'''new york times'''::tsvector @@ '''new yo'':*'::tsquery, --true
'''new york'''::tsvector @@ '''new yo'':*'::tsquery, --true
'''new'''::tsvector @@ '''new yo'':*'::tsquery, --false
'new'::tsvector @@ '''new yo'':*'::tsquery, --false
'new york'::tsvector @@ '''new yo'':*'::tsquery --false
The main problem is that to_tsvector()
and [plain]to_tsquery()
will strip your quotes. You can write your own versions that don't do this (it's not that hard), or do some post-processing after them to build your term n-grams.
The extra single quotes above are just escapes. select $$ i heart 'new york city' $$::tsvector;
is equivalent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With