I have 300 million addresses in my PostgreSQL 9.3 DB and I want to use pg_trgm to fuzzy search the rows. The final purpose is to implement a search function just like Google Map search.
When I used pg_trgm to search these addresses, it cost about 30s to get the results. There are many rows matching the default similarity threshold condition of 0.3 but I just need about 5 or 10 results. I created a trigram GiST index:
CREATE INDEX addresses_trgm_index ON addresses USING gist (address gist_trgm_ops);
This is my query:
SELECT address, similarity(address, '981 maun st') AS sml
FROM addresses
WHERE address % '981 maun st'
ORDER BY sml DESC
LIMIT 10;
The test table on production environment has been removed. I show the EXPLAIN
output from my test environment. There are about 7 million rows and it needs about 1.6s to get results. With 300 million, it needs more than 30s.
ebdb=> explain analyse select address, similarity(address, '781 maun st') as sml from addresses where address % '781 maun st' order by sml desc limit 10;
QUERY PLAN
————————————————————————————————————————————————————————————————————————————————
Limit (cost=7615.83..7615.86 rows=10 width=16) (actual time=1661.004..1661.010 rows=10 loops=1)
-> Sort (cost=7615.83..7634.00 rows=7268 width=16) (actual time=1661.003..1661.005 rows=10 loops=1)
Sort Key: (similarity((address)::text, '781 maun st'::text))
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using addresses_trgm_index on addresses (cost=0.41..7458.78 rows=7268 width=16) (actual time=0.659..1656.386 rows=5241 loops=1)
Index Cond: ((address)::text % '781 maun st'::text)
Total runtime: 1661.066 ms
(7 rows)
Is there a good way to improve the performance or is it a good plan to do table partitioning?
PostgreSQL 9.3 ... Is there a good way to improve the performance or is it a good plan to do table partitioning?
Table partitioning will not help at all.
But yes, there is a good way: Upgrade to a current version of Postgres. There have been many improvements for GiST indexes, for the pg_trgm module in particular and for big data in general. Should be substantially faster with Postgres 10.
Your "nearest neighbor" search looks correct but for a small LIMIT
use this equivalent query instead:
SELECT address, similarity(address, '981 maun st') AS sml
FROM addresses
WHERE address % '981 maun st'
ORDER BY address <-> '981 maun st'
LIMIT 10;
Quoting the manual:
It will usually beat the first formulation when only a small number of the closest matches is wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With