I'd like to join IP routing table information to IP whois information. I'm using Amazon's RDS which means I can't use the Postgres ip4r extension, and so I am instead using int8range types to represent the IP address ranges, with gist indexes.
My tables look like this:
=> \d routing_details
Table "public.routing_details"
Column | Type | Modifiers
----------+-----------+-----------
asn | text |
netblock | text |
range | int8range |
Indexes:
"idx_routing_details_netblock" btree (netblock)
"idx_routing_details_range" gist (range)
=> \d netblock_details
Table "public.netblock_details"
Column | Type | Modifiers
------------+-----------+-----------
range | int8range |
name | text |
country | text |
source | text |
Indexes:
"idx_netblock_details_range" gist (range)
The full routing_details table contains just under 600K rows, and netblock_details contains around 8.25M rows. There are overlapping ranges in both tables, but for each range in the routing_details table I want to get the single best (smallest) match from the netblock_details table.
I came up with 2 different queries that I think will return the accurate data, one using window functions and one using DISTINCT ON:
EXPLAIN SELECT DISTINCT ON (r.netblock) *
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
ORDER BY r.netblock, upper(n.range) - lower(n.range);
QUERY PLAN
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Unique (cost=118452809778.47..118477166326.22 rows=581300 width=91)
Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
-> Sort (cost=118452809778.47..118464988052.34 rows=4871309551 width=91)
Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
-> Nested Loop (cost=0.00..115920727265.53 rows=4871309551 width=91)
Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, (upper(n.range) - lower(n.range))
Join Filter: (r.range <@ n.range)
-> Seq Scan on public.routing_details r (cost=0.00..11458.96 rows=592496 width=43)
Output: r.asn, r.netblock, r.range
-> Materialize (cost=0.00..277082.12 rows=8221675 width=48)
Output: n.range, n.name, n.country
-> Seq Scan on public.netblock_details n (cost=0.00..163712.75 rows=8221675 width=48)
Output: n.range, n.name, n.country
(14 rows) -> Seq Scan on netblock_details n (cost=0.00..163712.75 rows=8221675 width=48)
EXPLAIN VERBOSE SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY r.range ORDER BY UPPER(n.range) - LOWER(n.range)) AS rank
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
) a WHERE rank = 1 ORDER BY netblock;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=118620775630.16..118620836521.53 rows=24356548 width=99)
Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
Sort Key: a.netblock
-> Subquery Scan on a (cost=118416274956.83..118611127338.87 rows=24356548 width=99)
Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
Filter: (a.rank = 1)
-> WindowAgg (cost=118416274956.83..118550235969.49 rows=4871309551 width=91)
Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, row_number() OVER (?), ((upper(n.range) - lower(n.range))), r.range
-> Sort (cost=118416274956.83..118428453230.71 rows=4871309551 width=91)
Output: ((upper(n.range) - lower(n.range))), r.range, r.asn, r.netblock, n.range, n.name, n.country
Sort Key: r.range, ((upper(n.range) - lower(n.range)))
-> Nested Loop (cost=0.00..115884192443.90 rows=4871309551 width=91)
Output: (upper(n.range) - lower(n.range)), r.range, r.asn, r.netblock, n.range, n.name, n.country
Join Filter: (r.range <@ n.range)
-> Seq Scan on public.routing_details r (cost=0.00..11458.96 rows=592496 width=43)
Output: r.asn, r.netblock, r.range
-> Materialize (cost=0.00..277082.12 rows=8221675 width=48)
Output: n.range, n.name, n.country
-> Seq Scan on public.netblock_details n (cost=0.00..163712.75 rows=8221675 width=48)
Output: n.range, n.name, n.country
(20 rows)
The DISTINCT ON seems slightly more efficient, so I've proceeded with that one. When I run the query against the full dataset I get an out of disk space error after around a 24h wait. I've created a routing_details_small table with a subset of N rows of the full routing_details table to try and understand what's going on.
With N=1000
=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=4411888.68..4453012.20 rows=999 width=90) (actual time=124.094..133.720 rows=999 loops=1)
-> Sort (cost=4411888.68..4432450.44 rows=8224705 width=90) (actual time=124.091..128.560 rows=4172 loops=1)
Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
Sort Method: external sort Disk: 608kB
-> Nested Loop (cost=0.41..1780498.29 rows=8224705 width=90) (actual time=0.080..101.518 rows=4172 loops=1)
-> Seq Scan on routing_details_small r (cost=0.00..20.00 rows=1000 width=42) (actual time=0.007..1.037 rows=1000 loops=1)
-> Index Scan using idx_netblock_details_range on netblock_details n (cost=0.41..1307.55 rows=41124 width=48) (actual time=0.063..0.089 rows=4 loops=1000)
Index Cond: (r.range <@ range)
Total runtime: 134.999 ms
(9 rows)
With N=100000
=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=654922588.98..659034941.48 rows=200 width=144) (actual time=28252.677..29487.380 rows=98992 loops=1)
-> Sort (cost=654922588.98..656978765.23 rows=822470500 width=144) (actual time=28252.673..28926.703 rows=454856 loops=1)
Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
Sort Method: external merge Disk: 64488kB
-> Nested Loop (cost=0.41..119890431.75 rows=822470500 width=144) (actual time=0.079..24951.038 rows=454856 loops=1)
-> Seq Scan on routing_details_small r (cost=0.00..1935.00 rows=100000 width=96) (actual time=0.007..110.457 rows=100000 loops=1)
-> Index Scan using idx_netblock_details_range on netblock_details n (cost=0.41..725.96 rows=41124 width=48) (actual time=0.067..0.235 rows=5 loops=100000)
Index Cond: (r.range <@ range)
Total runtime: 29596.667 ms
(9 rows)
With N=250000
=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1651822953.55..1662103834.80 rows=200 width=144) (actual time=185835.443..190143.266 rows=247655 loops=1)
-> Sort (cost=1651822953.55..1656963394.18 rows=2056176250 width=144) (actual time=185835.439..188779.279 rows=1103850 loops=1)
Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
Sort Method: external merge Disk: 155288kB
-> Nested Loop (cost=0.28..300651962.46 rows=2056176250 width=144) (actual time=19.325..177403.913 rows=1103850 loops=1)
-> Seq Scan on netblock_details n (cost=0.00..163743.05 rows=8224705 width=48) (actual time=0.007..8160.346 rows=8224705 loops=1)
-> Index Scan using idx_routing_details_small_range on routing_details_small r (cost=0.28..22.16 rows=1250 width=96) (actual time=0.018..0.018 rows=0 loops=8224705)
Index Cond: (range <@ n.range)
Total runtime: 190413.912 ms
(9 rows)
Against the full table with 600k rows the query fails after around 24h with an error about running out of disk space, which is presumably caused by the external merge step. So this query is working well and very quickly with a small routing_details table, but is scaling very poorly.
Suggestions for how to improve my query, or perhaps even schema changes I could make so that this query will work efficiently on the full dataset?
As commercial database vendors are bragging about their capabilities we decided to push PostgreSQL to the next level and exceed 1 billion rows per second to show what we can do with Open Source. To those who need even more: 1 billion rows is by far not the limit - a lot more is possible. Watch and see how we did it.
Over the years, we've scaled up to 75 terabyte (TB) of stored data across nearly 40 servers. Our real-time segmentation features have benefited greatly from PostgreSQL's performance, but we've also struggled at times due to bloat caused by our heavy write load and limitations of the PostgreSQL upgrade path.
There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns. PostgreSQL has a limit of 1GB for the size of any one field in a table.
Yes the number of columns will - indirectly - influence the performance. The data in the columns will also affect the speed. Why is that? Every DBMS stores rows in blocks - typically 8k blocks, but not necessarily.
I was thinking originally of a lateral join as in other suggested approaches (for example, the last query by Erwin Brandstetter, where he uses simple int8
datatype and simple indexes), but didn't want to write it in the answer, because I thought that it is not really efficient. When you try to find all netblock
ranges that cover the given range, an index doesn't help much.
I'll repeat the Erwin Brandstetter's query here:
SELECT * -- only select columns you need to make it faster
FROM routing_details r
, LATERAL (
SELECT *
FROM netblock_details n
WHERE n.ip_min <= r.ip_min
AND n.ip_max >= r.ip_max
ORDER BY n.ip_max - n.ip_min
LIMIT 1
) n;
When you have an index on netblock_details, like this:
CREATE INDEX netblock_details_ip_min_max_idx ON netblock_details
(ip_min, ip_max DESC NULLS LAST);
you can quickly (in O(logN)
) find the starting point of the scan in the netblock_details
table - either the maximum n.ip_min
that is less than r.ip_min
, or the minimum n.ip_max
that is more than r.ip_max
. But then you have to scan/read the rest of the index/table and for each row do the second part of the check and filter out most rows.
In other words, this index helps to quickly find the starting row that satisfies first search criteria: n.ip_min <= r.ip_min
, but then you'll continue reading all rows that satisfy this criteria and for each such row perform the second check n.ip_max >= r.ip_max
. On average (if data has even distribution) you'll have to read half of the rows of the netblock_details
table. Optimizer may choose to use index to search n.ip_max >= r.ip_max
first and then apply second filter n.ip_min <= r.ip_min
, but you can't use this index to apply both filters together.
End result:
for each row from routing_details
we'll read through half of rows from netblock_details
. 600K * 4M = 2,400,000,000,000 rows.
It is 2 times better than Cartesian product. You can see this number (Cartesian product) in the output of EXPLAIN ANALYZE
in the question.
592,496 * 8,221,675 = 4,871,309,550,800
No wonder your queries run out of disk space when trying to materialize and sort this.
The overall high level process to get to the final result:
join two tables (without finding the best match yet). In the worst case it is Cartesian product, in the best case it is all covering ranges from netblock_details
for each range from routing_details
. You said that there are multiple entries in netblock_details
for each entry in routing_details
, anything from 3 to around 10. So, result of this join could have ~6M rows (not too much)
order/partition the result of the join by the routing_details
ranges and for each such range find the best (smallest) covering range from netblock_details
.
My idea is to reverse the query. Instead of finding all covering ranges from larger netblock_details
for each row from smaller routing_details
table I suggest to find all smaller ranges from smaller routing_details
for each row from larger netblock_details
.
Two step process
For each row from larger netblock_details
find all ranges from routing_details
that are inside the netblock
range.
I would use the following schema in the queries. I've added primary key ID
to both tables.
CREATE TABLE routing_details (
ID int
,ip_min int8
,ip_max int8
,asn text
,netblock text
);
CREATE TABLE netblock_details (
ID int
,ip_min int8
,ip_max int8
,name text
,country text
,source text
);
SELECT
netblock_details.ID AS n_ID
,netblock_details.ip_max - netblock_details.ip_min AS n_length
,r.ID AS r_ID
FROM
netblock_details
INNER JOIN LATERAL
(
SELECT routing_details.ID
FROM routing_details
WHERE
routing_details.ip_min >= netblock_details.ip_min
AND routing_details.ip_min <= netblock_details.ip_max
-- note how routing_details.ip_min is limited from both sides
-- this would make it possible to scan only (hopefully) small
-- portion of the table instead of full or half table
AND routing_details.ip_max <= netblock_details.ip_max
-- this clause ensures that the whole routing range
-- is inside the netblock range
) AS r ON true
We need index on routing_details
on (ip_min, ip_max)
. The main thing here is index on ip_min
. Having second column in the index helps by eliminating the need to do the lookup for the value of ip_max
; it doesn't help in the tree search.
Note that the lateral subquery doesn't have LIMIT
. It is not the final result yet. This query should return ~6M rows. Save this result in a temporary table.
Add an index to the temporary table on (r_ID, n_length, n_ID)
. n_ID
is again just to remove extra lookups. We need an index do avoid sorting the data for each r_ID
.
Final step
For each row from routing_details
find the n_ID
with the smallest n_length
. Now we can use the lateral join in "proper" order. Here temp
table is result of the previous step with the index.
SELECT
routing_details.*
,t.n_ID
,netblock_details.*
FROM
routing_details
INNER JOIN LATERAL
(
SELECT temp.n_ID
FROM temp
WHERE temp.r_ID = routing_details.ID
ORDER BY temp.n_length
LIMIT 1
) AS t ON true
INNER JOIN netblock_details ON netblock_details.ID = t.n_ID
Here subquery should be a seek of the index, not scan. Optimizer should be smart enough to do just the seek and return the first found result because of LIMIT 1
, so you'll have 600K seeks of index in 6M row temp table.
Original answer (I'll keep it just for the diagram of ranges):
Can you "cheat"?
If I understood your query correctly, for each routing_details.range
you want to find a smallest netblock_details.range
that covers/is larger than routing_details.range
.
Given the following example, where r
is routing range and n1, ..., n8
are netblock ranges, the correct answer is n5
.
|---|
n1
|------------------|
n2
|---------------|
n3
|-----|
n4
|------------------|
n5
|--------------------------------------|
n6
|---------------------------|
n7
|-----|
n8
|------------|
r
start end
n.start <= r.start AND n.end >= r.end
order by n.length
limit 1
Your query that took 14 hours shows that index scan took 6ms, but sorting by range length took 80ms.
With this kind of search there is no simple 1D ordering of the data. You are using n.start
together with n.end
and together with n.length
. But, maybe your data is not that generic, or it is OK to return a somewhat different result.
For example, if it was OK to return n6
as a result, it could work much faster. n6
is the covering range that has largest start
:
n.start <= r.start AND n.end >= r.end
order by n.start desc
limit 1
Or, you could go for n7
, which has smallest end
:
n.start <= r.start AND n.end >= r.end
order by n.end
limit 1
This kind of search would use simple index on n.start
(or n.end
) without extra sorting.
A second, completely different approach. How big/long are the ranges? If they are relatively short (few numbers), then you could try to store them as an explicit list of integers, instead of a range.
For example, range [5-8]
would be stored as 4 rows: (5, 6, 7, 8)
. With this storage model it may be easier to find intersections of ranges.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With