I am testing performance for PostgreSQL full text search (using pg_search gem) and solr (sunspot_solr gem).
For 4 million records I am getting 13456 ms for Tsearch and 800 ms with SOLR (that is SOLR query + DB retrival). It is obvious that I need index but I am not sure how to create one for full text search. I investigated and found that for full text search I should use GIN index.
execute "CREATE INDEX products_gin_title ON products USING GIN(to_tsvector('english', title))"
But I am searching via two more columns and I need multi-value index and I am not sure how to implement it? I am not very familiar with DB part. My search code looks like:
@results = Product.search_title(params[:search_term]).where("platform_id=? AND product_type=?", params[:platform_id], params[:type_id]).limit(10).all
How do I create proper query for this type of situations?
Here is SQL output from rails for search term car.
Product Load (12494.0ms)
SELECT
"products".*,
( ts_rank((to_tsvector('simple', coalesce("products"."title"::text, ''))), (to_ tsquery('simple', ''' ' || 'car' || ' ''')), 2) ) AS pg_search_rank
FROM "products"
WHERE (((to_tsvector('simple', coalesce("products"."tit le"::text, ''))) @@ (to_tsquery('simple', ''' ' || 'car' || ' '''))))
AND (platform_id='26' AND product_type='2')
ORDER BY pg_search_rank DESC, "products"."id" ASC
LIMIT 10
EDIT:
I am using PostgreSQL 8.4.11, EXPLAIN ANALYZE
output is following.
Limit (cost=108126.34..108126.36 rows=10 width=3824) (actual time=12228.736..12228.738 rows=10 loops=1)
-> Sort (cost=108126.34..108163.84 rows=14999 width=3824) (actual time=12228.733..12228.734 rows=10 loops=1)
Sort Key: (ts_rank(to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)), '''car'''::tsquery, 2)), id
Sort Method: top-N heapsort Memory: 18kB
-> Seq Scan on products (cost=0.00..107802.22 rows=14999 width=3824) (actual time=7.532..12224.585 rows=977 loops=1)
Filter: ((platform_id = 26) AND (product_type = 2) AND (to_tsvector('simple'::regconfig, COALESCE((title)::text, ''::text)) @@ '''car'''::tsquery))
Total runtime: 12228.813 ms
Yes, You Can Keep Full-Text Search in Postgres You can get even deeper and make your Postgres full-text search even more robust, by implementing features such as highlighting results, or writing your own custom dictionaries or functions.
To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.
tsvector. A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details).
We have search text by using the to_tsvector function in PostgreSQL. In to_tsvector, “ts” is defined as text search. In to_tsvector, the tsvector is the data type of to_tsvector function. This function will return the lexeme tokens with pointers in PostgreSQL.
This expression:
to_tsvector('simple', (COALESCE(title::TEXT), ''))
is not sargable against your index.
You should declare the index on the exactly that expression which is used in the query:
CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', COALESCE(title::TEXT,'')))
(or make ruby generate the expression which is used in the index).
If you want multiple columns to be indexed, just concatenate them:
CREATE INDEX products_gin_title
ON products
USING GIN(to_tsvector('simple', title || ' ' || product_type || ' ' || platform_id))
but again, Ruby should be filtering on exactly same expression for the index to be of use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With