Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL Full Text Search and Trigram Confusion

I'm a little bit confused with the whole concept of PostgreSQL, full text search and Trigram. In my full text search queries, I'm using tsvectors, like so:

SELECT * FROM articles WHERE search_vector @@ plainto_tsquery('english', 'cat, bat, rat'); 

The problem is, this method doesn't account for misspelling. Then I started to read about Trigram and pg_trgm:

Looking through other examples, it seems like trigram is used or vectors are used, but never both. So my questions are: Are they ever used together? If so, how? Does trigram replace full text? Are trigrams more accurate? And how are trigrams on performance?

like image 710
Devin Dixon Avatar asked Apr 08 '13 16:04

Devin Dixon


1 Answers

They serve very different purposes.

  • Full Text Search is used to return documents that match a search query of stemmed words.
  • Trigrams give you a method for comparing two strings and determining how similar they look.

Consider the following examples:

SELECT 'cat' % 'cats'; --true 

The above returns true because 'cat' is quite similar to 'cats' (as dictated by the pg_trgm limit).

SELECT 'there is a cat with a dog' % 'cats'; --false 

The above returns false because % is looking for similarily between the two entire strings, not looking for the word cats within the string.

SELECT to_tsvector('there is a cat with a dog') @@ to_tsquery('cats'); --true 

This returns true becauase tsvector transformed the string into a list of stemmed words and ignored a bunch of common words (stop words - like 'is' & 'a')... then searched for the stemmed version of cats.

It sounds like you want to use trigrams to auto-correct your ts_query but that is not really possible (not in any efficient way anyway). They do not really know a word is misspelt, just how similar it might be to another word. They could be used to search a table of words to try and find similar words, allowing you to implement a "did you mean..." type feature, but this word require maintaining a separate table containing all the words used in your search field.

If you have some commonly misspelt words/phrases that you want the text-index to match you might want to look at Synonym Dictorionaries

like image 197
Chris Farmiloe Avatar answered Oct 06 '22 13:10

Chris Farmiloe