I have to decide whether to use GIN or GiST indexing for an hstore column.
The Postgres docs state:
The way I interpret it, use GIN if you need to query a lot, use GiST if you need to update a lot.
In this test, all of the three disadvantages of GIN over GiST mentioned above are confirmed. However, other than suggested in the Postgres docs, the advantage of GIN over GiST (faster lookup) is very small. Slide 53 shows that in the test GIN was only 2% to 3% faster as opposed to 200% to 300% suggested in the Postgres docs.
Which source of information is more reliable and why?
In Postgres, a B-Tree index is what you most commonly want Using an index is much faster than a sequential scan because it may only have to read a few pages as opposed to sequentially scanning thousands of them (when you're returning only a few records). If you run a standard CREATE INDEX it creates a B-tree for you.
For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.
A unique index guarantees that the table won't have more than one row with the same value. It's advantageous to create unique indexes for two reasons: data integrity and performance. Lookups on a unique index are generally very fast.
GIN stands for Generalized Inverted Index. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.
The documents state what the situation is "in general".
However, you aren't running PostgreSQL "in general", you are running it on specific hardware with a specific pattern of use.
So - if you care a lot, then you'll want to test it yourself. A GiST index will always require re-checking its condition. However if the queries you run end up doing further checks anyway, a GIN index might not win there. Also there are all the usual issues around cache usage etc.
For my usage, on smaller databases with moderate update rates, I've been happy enough with GiST. I've seen a 50% improvement in speed with GIN (across a whole query), but it's not been worth the slower indexing. If I was building a huge archive server it might be different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With