Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper full text index Rails/PostgreSQL/pg_search

I am testing performance for PostgreSQL full text search (using pg_search gem) and solr (sunspot_solr gem).

For 4 million records I am getting 13456 ms for Tsearch and 800 ms with SOLR (that is SOLR query + DB retrival). It is obvious that I need index but I am not sure how to create one for full text search. I investigated and found that for full text search I should use GIN index.

execute "CREATE INDEX products_gin_title ON products USING GIN(to_tsvector('english', title))"

But I am searching via two more columns and I need multi-value index and I am not sure how to implement it? I am not very familiar with DB part. My search code looks like:

@results = Product.search_title(params[:search_term]).where("platform_id=? AND product_type=?", params[:platform_id], params[:type_id]).limit(10).all

How do I create proper query for this type of situations?

Here is SQL output from rails for search term car.

Product Load (12494.0ms)
SELECT 
    "products".*, 
    ( ts_rank((to_tsvector('simple', coalesce("products"."title"::text, ''))), (to_ tsquery('simple', ''' ' || 'car' || ' ''')), 2) ) AS pg_search_rank 
FROM "products" 
WHERE (((to_tsvector('simple', coalesce("products"."tit le"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'car' || ' ''')))) 
    AND (platform_id='26' AND product_type='2') 
ORDER BY pg_search_rank DESC, "products"."id" ASC 
LIMIT 10

EDIT:

I am using PostgreSQL 8.4.11, EXPLAIN ANALYZE output is following.

Limit  (cost=108126.34..108126.36 rows=10 width=3824) (actual time=12228.736..12228.738 rows=10 loops=1)   
->  Sort (cost=108126.34..108163.84 rows=14999 width=3824) (actual time=12228.733..12228.734 rows=10 loops=1)
    Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)), '''car'''::tsquery, 2)), id
    Sort Method:  top-N heapsort  Memory: 18kB
    ->  Seq Scan on products  (cost=0.00..107802.22 rows=14999 width=3824) (actual time=7.532..12224.585 rows=977 loops=1)
        Filter: ((platform_id = 26) AND (product_type = 2) AND (to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)) @@ '''car'''::tsquery)) 

Total runtime: 12228.813 ms
like image 267
Haris Krajina Avatar asked Jun 04 '12 11:06

Haris Krajina


People also ask

Is PostgreSQL good for full text search?

Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.

What is FTS in PostgreSQL?

To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.

What is Tsvector in PostgreSQL?

tsvector. A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).

How do I find text in PostgreSQL function?

We have search text by using the to_tsvector function in PostgreSQL. In to_tsvector, “ts” is defined as text search. In to_tsvector, the tsvector is the data type of to_tsvector function. This function will return the lexeme tokens with pointers in PostgreSQL.


1 Answers

This expression:

to_tsvector('simple', (COALESCE(title::TEXT), ''))

is not sargable against your index.

You should declare the index on the exactly that expression which is used in the query:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', COALESCE(title::TEXT,'')))

(or make ruby generate the expression which is used in the index).

If you want multiple columns to be indexed, just concatenate them:

CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', title || ' ' || product_type || ' ' || platform_id))

but again, Ruby should be filtering on exactly same expression for the index to be of use.

like image 133
Quassnoi Avatar answered Oct 04 '22 21:10

Quassnoi