Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between GiST and GIN index

I am implementing a table that has a column with a data type of tsvector and I am trying to understand what index would be better to use?

GIN or GiST?

In looking through the postgres documentation here I seem to get that:

  • GiST is faster to update and build the index and less accurate than gin.

  • GIN is slower to update and build the index but is more accurate.

OK, so why would anybody want a gist indexed field over gin? If gist could give you the wrong results? There must be some advantage (outside performance) on this.

Can anybody explain in layman's terms when I would want to use GIN vs. GiST?

like image 296
Walker Farrow Avatar asked Mar 10 '15 22:03

Walker Farrow


People also ask

What is a gin index?

GIN stands for Generalized Inverted Index. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.

What is hash index in PostgreSQL?

Hash Index. Just like the name suggests, Hash indexes in PostgreSQL use a form of the hash table data structure. Hash table is a common data structure in many programming languages. For example, a Dict in Python, a HashMap in Java or the new Map type in JavaScript.


1 Answers

I don't think I could explain it better than the manual already does:

In choosing which index type to use, GiST or GIN, consider these performance differences:

  • GIN index lookups are about three times faster than GiST

  • GIN indexes take about three times longer to build than GiST

  • GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled [...]

  • GIN indexes are two-to-three times larger than GiST indexes

Link and quote refer to the manual for Postgres 9.4. Size and performance estimates seemed slightly outdated already. With Postgres 9.4 the odds have shifted substantially in favor of GIN.
The release notes of Postgres 9.4 include:

  • Reduce GIN index size (Alexander Korotkov, Heikki Linnakangas) [...]

  • Improve speed of multi-key GIN lookups (Alexander Korotkov, Heikki Linnakangas)

Size and performance estimates have since been removed from the manual.

Note that there are special use cases that require one or the other.

One thing you misunderstood: You never get wrong results with a GiST index. The index operates on hash values, which can lead to false positives in the index. This should only become relevant with a very big number of different words in your documents. False positives are eliminated after re-checking the actual row in any case. The manual:

A GiST index is lossy, meaning that the index may produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.)

Bold emphasis mine.

like image 109
Erwin Brandstetter Avatar answered Sep 26 '22 02:09

Erwin Brandstetter