Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I optimise this LIKE JOIN query?

This query finds the suffix of a domain:

        SELECT
        DISTINCT ON ("companyDomain".id)
            "companyDomain".domain,
            "publicSuffix".suffix
        FROM
            "companyDomain"
        INNER JOIN
            "publicSuffix"
        ON
            REVERSE("companyDomain".domain) LIKE REVERSE("publicSuffix".suffix) || '%'
        ORDER BY
            "companyDomain".id, LENGTH("publicSuffix".suffix) DESC

Edit: Notice this also works with subdomains.

You can fiddle with the example here and visualize the plan with pev. I've tried adding covering indexes to the tables but they end up not being used by the query planner. Also perhaps there's another query that could be more efficient?

like image 459
DonDrapper Avatar asked Jun 20 '20 11:06

DonDrapper


1 Answers

Have you considered using a gin index ?

I made the following modifications to your sample DML:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
...
CREATE INDEX companyDomain_domain_reverse ON "companyDomain" USING gin (REVERSE(domain) gin_trgm_ops);
...
CREATE INDEX publicSuffix_suffix_reverse ON "publicSuffix" USING gin (REVERSE(suffix) gin_trgm_ops);

And here is the query plan:

+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|Unique  (cost=40802.07..41004.44 rows=908 width=31) (actual time=98.229..98.356 rows=908 loops=1)                                                       |
|  ->  Sort  (cost=40802.07..40903.26 rows=40474 width=31) (actual time=98.228..98.264 rows=1006 loops=1)                                                |
|        Sort Key: "companyDomain".id, (length(("publicSuffix".suffix)::text)) DESC                                                                      |
|        Sort Method: quicksort  Memory: 103kB                                                                                                           |
|        ->  Nested Loop  (cost=0.05..37704.86 rows=40474 width=31) (actual time=1.655..97.976 rows=1006 loops=1)                                        |
|              ->  Seq Scan on "publicSuffix"  (cost=0.00..151.15 rows=8915 width=12) (actual time=0.011..0.728 rows=8915 loops=1)                       |
|              ->  Bitmap Heap Scan on "companyDomain"  (cost=0.05..4.15 rows=5 width=15) (actual time=0.010..0.010 rows=0 loops=8915)                   |
|                    Recheck Cond: (reverse((domain)::text) ~~ (reverse(("publicSuffix".suffix)::text) || '%'::text))                                    |
|                    Rows Removed by Index Recheck: 0                                                                                                    |
|                    Heap Blocks: exact=301                                                                                                              |
|                    ->  Bitmap Index Scan on companydomain_domain_reverse  (cost=0.00..0.05 rows=5 width=0) (actual time=0.010..0.010 rows=0 loops=8915)|
|                          Index Cond: (reverse((domain)::text) ~~ (reverse(("publicSuffix".suffix)::text) || '%'::text))                                |
|Planning Time: 0.150 ms                                                                                                                                 |
|Execution Time: 98.439 ms                                                                                                                               |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+

As a bonus - you do not even need to to REVERSE() the text in the index and in the query:

create index companydomain_domain
    on "companyDomain" using gin(domain gin_trgm_ops);



SELECT DISTINCT ON ("companyDomain".id) "companyDomain".domain, "publicSuffix".suffix
FROM "companyDomain"
         INNER JOIN "publicSuffix" ON "companyDomain".domain LIKE '%' || "publicSuffix".suffix
ORDER BY "companyDomain".id, LENGTH("publicSuffix".suffix) DESC

The query takes the same amount of time and still uses the gin index:

+------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------+
|Unique  (cost=40556.91..40759.28 rows=908 width=31) (actual time=96.170..96.315 rows=908 loops=1)                                               |
|  ->  Sort  (cost=40556.91..40658.10 rows=40474 width=31) (actual time=96.169..96.209 rows=1006 loops=1)                                        |
|        Sort Key: "companyDomain".id, (length(("publicSuffix".suffix)::text)) DESC                                                              |
|        Sort Method: quicksort  Memory: 103kB                                                                                                   |
|        ->  Nested Loop  (cost=0.05..37459.70 rows=40474 width=31) (actual time=1.764..95.919 rows=1006 loops=1)                                |
|              ->  Seq Scan on "publicSuffix"  (cost=0.00..151.15 rows=8915 width=12) (actual time=0.009..0.711 rows=8915 loops=1)               |
|              ->  Bitmap Heap Scan on "companyDomain"  (cost=0.05..4.12 rows=5 width=15) (actual time=0.010..0.010 rows=0 loops=8915)           |
|                    Recheck Cond: ((domain)::text ~~ ('%'::text || ("publicSuffix".suffix)::text))                                              |
|                    Rows Removed by Index Recheck: 0                                                                                            |
|                    Heap Blocks: exact=301                                                                                                      |
|                    ->  Bitmap Index Scan on companydomain_domain  (cost=0.00..0.05 rows=5 width=0) (actual time=0.010..0.010 rows=0 loops=8915)|
|                          Index Cond: ((domain)::text ~~ ('%'::text || ("publicSuffix".suffix)::text))                                          |
|Planning Time: 0.132 ms                                                                                                                         |
|Execution Time: 96.393 ms                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------+

PS: I guess you need only one of the indexes - in this case: companyDomain_domain_reverse

like image 134
Svetlin Zarev Avatar answered Sep 24 '22 15:09

Svetlin Zarev