This query finds the suffix of a domain:
SELECT
DISTINCT ON ("companyDomain".id)
"companyDomain".domain,
"publicSuffix".suffix
FROM
"companyDomain"
INNER JOIN
"publicSuffix"
ON
REVERSE("companyDomain".domain) LIKE REVERSE("publicSuffix".suffix) || '%'
ORDER BY
"companyDomain".id, LENGTH("publicSuffix".suffix) DESC
Edit: Notice this also works with subdomains.
You can fiddle with the example here and visualize the plan with pev. I've tried adding covering indexes to the tables but they end up not being used by the query planner. Also perhaps there's another query that could be more efficient?
Have you considered using a gin
index ?
I made the following modifications to your sample DML:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
...
CREATE INDEX companyDomain_domain_reverse ON "companyDomain" USING gin (REVERSE(domain) gin_trgm_ops);
...
CREATE INDEX publicSuffix_suffix_reverse ON "publicSuffix" USING gin (REVERSE(suffix) gin_trgm_ops);
And here is the query plan:
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
|Unique (cost=40802.07..41004.44 rows=908 width=31) (actual time=98.229..98.356 rows=908 loops=1) |
| -> Sort (cost=40802.07..40903.26 rows=40474 width=31) (actual time=98.228..98.264 rows=1006 loops=1) |
| Sort Key: "companyDomain".id, (length(("publicSuffix".suffix)::text)) DESC |
| Sort Method: quicksort Memory: 103kB |
| -> Nested Loop (cost=0.05..37704.86 rows=40474 width=31) (actual time=1.655..97.976 rows=1006 loops=1) |
| -> Seq Scan on "publicSuffix" (cost=0.00..151.15 rows=8915 width=12) (actual time=0.011..0.728 rows=8915 loops=1) |
| -> Bitmap Heap Scan on "companyDomain" (cost=0.05..4.15 rows=5 width=15) (actual time=0.010..0.010 rows=0 loops=8915) |
| Recheck Cond: (reverse((domain)::text) ~~ (reverse(("publicSuffix".suffix)::text) || '%'::text)) |
| Rows Removed by Index Recheck: 0 |
| Heap Blocks: exact=301 |
| -> Bitmap Index Scan on companydomain_domain_reverse (cost=0.00..0.05 rows=5 width=0) (actual time=0.010..0.010 rows=0 loops=8915)|
| Index Cond: (reverse((domain)::text) ~~ (reverse(("publicSuffix".suffix)::text) || '%'::text)) |
|Planning Time: 0.150 ms |
|Execution Time: 98.439 ms |
+--------------------------------------------------------------------------------------------------------------------------------------------------------+
As a bonus - you do not even need to to REVERSE()
the text in the index and in the query:
create index companydomain_domain
on "companyDomain" using gin(domain gin_trgm_ops);
SELECT DISTINCT ON ("companyDomain".id) "companyDomain".domain, "publicSuffix".suffix
FROM "companyDomain"
INNER JOIN "publicSuffix" ON "companyDomain".domain LIKE '%' || "publicSuffix".suffix
ORDER BY "companyDomain".id, LENGTH("publicSuffix".suffix) DESC
The query takes the same amount of time and still uses the gin index:
+------------------------------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------------+
|Unique (cost=40556.91..40759.28 rows=908 width=31) (actual time=96.170..96.315 rows=908 loops=1) |
| -> Sort (cost=40556.91..40658.10 rows=40474 width=31) (actual time=96.169..96.209 rows=1006 loops=1) |
| Sort Key: "companyDomain".id, (length(("publicSuffix".suffix)::text)) DESC |
| Sort Method: quicksort Memory: 103kB |
| -> Nested Loop (cost=0.05..37459.70 rows=40474 width=31) (actual time=1.764..95.919 rows=1006 loops=1) |
| -> Seq Scan on "publicSuffix" (cost=0.00..151.15 rows=8915 width=12) (actual time=0.009..0.711 rows=8915 loops=1) |
| -> Bitmap Heap Scan on "companyDomain" (cost=0.05..4.12 rows=5 width=15) (actual time=0.010..0.010 rows=0 loops=8915) |
| Recheck Cond: ((domain)::text ~~ ('%'::text || ("publicSuffix".suffix)::text)) |
| Rows Removed by Index Recheck: 0 |
| Heap Blocks: exact=301 |
| -> Bitmap Index Scan on companydomain_domain (cost=0.00..0.05 rows=5 width=0) (actual time=0.010..0.010 rows=0 loops=8915)|
| Index Cond: ((domain)::text ~~ ('%'::text || ("publicSuffix".suffix)::text)) |
|Planning Time: 0.132 ms |
|Execution Time: 96.393 ms |
+------------------------------------------------------------------------------------------------------------------------------------------------+
PS: I guess you need only one of the indexes - in this case: companyDomain_domain_reverse
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With