Why are PostgreSQL Text-Search GiST indexes so much slower than GIN indexes?

Tags:

I'm testing out the PostgreSQL Text-Search features, using the September data dump from StackOverflow as sample data. :-)

The naive approach of using LIKE predicates or POSIX regular expression matching to search 1.2 million rows takes about 90-105 seconds (on my Macbook) to do a full table-scan searching for a keyword.

SELECT * FROM Posts WHERE body LIKE '%postgresql%';
SELECT * FROM Posts WHERE body ~ 'postgresql';

An unindexed, ad hoc text-search query takes about 8 minutes:

SELECT * FROM Posts WHERE to_tsvector(body) @@ to_tsquery('postgresql');

Creating a GIN index takes about 40 minutes:

ALTER TABLE Posts ADD COLUMN PostText TSVECTOR;
UPDATE Posts SET PostText = to_tsvector(body);
CREATE INDEX PostText_GIN ON Posts USING GIN(PostText);

(I realize I could also do this in one step by defining it as an expression index.)

Afterwards, a query assisted by a GIN index runs a lot faster -- this takes about 40 milliseconds:

SELECT * FROM Posts WHERE PostText @@ 'postgresql';

However, when I create a GiST index, the results are quite different. It takes less than 2 minutes to create the index:

CREATE INDEX PostText_GIN ON Posts USING GIST(PostText);

Afterwards, a query using the @@ text-search operator takes 90-100 seconds. So GiST indexes do improve an unindexed TS query from 8 minutes to 1.5 minutes. But that's no improvement over doing a full table-scan with LIKE. It's useless in a web programming environment.

Am I missing something crucial to using GiST indexes? Do the indexes need to be pre-cached in memory or something? I am using a plain PostgreSQL installation from MacPorts, with no tuning.

What is the recommended way to use GiST indexes? Or does everyone doing TS with PostgreSQL skip GiST indexes and use only GIN indexes?

PS: I do know about alternatives like Sphinx Search and Lucene. I'm just trying to learn about the features provided by PostgreSQL itself.

861

asked Oct 08 '09 20:10

Bill Karwin

1 Answers

The docs have a nice overview of the performance differences between GiST and GIN indexes if you're interested: GiST and GIN Index Types.

102

answered Sep 23 '22 02:09

mattonrails

Related questions
                            
                                OpenGL 4.1 and 3.1+, What are key differences? [closed]
                            
                                Oracle and JDBC performance: INSERT ALL vs preparedStatement.addBatch
                            
                                Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?
                            
                                std::lower_bound slower for std::vector than std::map::find
                            
                                Why does jQuery not provide a .firstChild method?
                            
                                Cleaner faster JavaScript, Replacing jQuery with JavaScript
                            
                                Why is this iterative Collatz method 30% slower than its recursive version in Python?
                            
                                Using Roslyn Emit method with a ModuleBuilder instead of a MemoryStream
                            
                                Linq To Sql vs Entity Framework Performance
                            
                                Python equivalent of std::set and std::multimap
                            
                                Compiling Android project from command line is slow
                            
                                PostgreSQL: How to structure and index time-related data for optimal query performance?
                            
                                ListView Resize Columns Performance Issues (Grouping)
                            
                                Apache Drill has bad performance against SQL Server
                            
                                Random mmaped memory access up to 16% slower than heap data access
                            
                                What are the differences between bool() and operator.truth()?
                            
                                WPF RichTextBox Performance
                            
                                Add Expire Headers in php can't make it work
                            
                                Java Mutable BigInteger Class
                            
                                Quickest implementation of Java Map for a small number of entries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are PostgreSQL Text-Search GiST indexes so much slower than GIN indexes?

Tags:

performance

full-text-search

postgresql

Bill Karwin

People also ask

1 Answers

mattonrails

Recent Activity

Donate For Us