Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

suggest like google with postgresql trigrams and full text search

I want to do a text search like google suggestions.

I'm using PostgreSQL because of the magical Postgis.

I was thinking on using FTS, but I saw that it could not search partial words, so I found this question, and saw how trigrams works.

The main problem is that the search engine I'm working on is for spanish language. FTS worked great with stemming and dictionaries (synonyms, misspells), UTF and so on. Trigrams worked great for partial words, but they only work for ASCII, and (obviously) they don't use things like dictionaries.

I was thinking if is there any way in which the best things from both could be used.

Is it possible make Full Text Search and Trigrams to work together in PGSQL?

like image 801
jperelli Avatar asked May 16 '12 15:05

jperelli


People also ask

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

What is trigram in Postgres?

A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.

What is Tsvector in PostgreSQL?

A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).


1 Answers

You can do this in Postgres, and don't need Lucene.

You can quote phrases in tsquery or tsvector like the below. You can add a :* after a tsquery term to do a prefix search:

select
'''new york city'''::tsvector   @@ '''new yo'':*'::tsquery, --true
'''new york times'''::tsvector  @@ '''new yo'':*'::tsquery, --true
'''new york'''::tsvector        @@ '''new yo'':*'::tsquery, --true
'''new'''::tsvector             @@ '''new yo'':*'::tsquery, --false
'new'::tsvector                 @@ '''new yo'':*'::tsquery, --false
'new york'::tsvector            @@ '''new yo'':*'::tsquery  --false

The main problem is that to_tsvector() and [plain]to_tsquery() will strip your quotes. You can write your own versions that don't do this (it's not that hard), or do some post-processing after them to build your term n-grams.

The extra single quotes above are just escapes. select $$ i heart 'new york city' $$::tsvector; is equivalent.

like image 120
Neil McGuigan Avatar answered Oct 19 '22 17:10

Neil McGuigan