Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gin index and difference between gin_trgm_ops and to_tsvector in PostgreSQL

Tags:

sql

postgresql

I am trying to understand how to use gin index for full text search in PostgreSQL. And I see that there are two ways to do it. For example, if we have a table:

CREATE TABLE IF NOT EXISTS users (
    id SERIAL NOT NULL,
    name VARCHAR(512) NOT NULL,
    PRIMARY KEY (id));

Then we can create index using such variants:

CREATE INDEX users_name_idx ON users USING gin (name gin_trgm_ops);

or

CREATE INDEX users_name_idx ON users USING gin (to_tsvector('language', name));

As I understand (I can be wrong) the first variant divides text to three letter tokens and doesn't depend on the language. The second variant uses stemming to return list of word roots and that's why depends on language.

My question - is my understanding correct and in what cases I should use the first variant and in what cases the second.


1 Answers

What you say is correct.

In addition to that, and maybe the most important difference, full text search can only search for whole words, while a trigram index can be used to search for arbitrary substrings and also find results that are only similar to the search condition (using the distance operator).

Unsurprisingly, trigram indexes don't perform well for short search strings.

like image 154
Laurenz Albe Avatar answered Sep 22 '25 11:09

Laurenz Albe