Joining 2 large postgres tables using int8range not scaling well

Tags:

I'd like to join IP routing table information to IP whois information. I'm using Amazon's RDS which means I can't use the Postgres ip4r extension, and so I am instead using int8range types to represent the IP address ranges, with gist indexes.

My tables look like this:

Click to copy

=> \d routing_details
     Table "public.routing_details"
  Column  |   Type    | Modifiers
----------+-----------+-----------
 asn      | text      |
 netblock | text      |
 range    | int8range |
Indexes:
    "idx_routing_details_netblock" btree (netblock)
    "idx_routing_details_range" gist (range)


=> \d netblock_details
    Table "public.netblock_details"
   Column   |   Type    | Modifiers
------------+-----------+-----------
 range      | int8range |
 name       | text      |
 country    | text      |
 source     | text      |
Indexes:
    "idx_netblock_details_range" gist (range)

The full routing_details table contains just under 600K rows, and netblock_details contains around 8.25M rows. There are overlapping ranges in both tables, but for each range in the routing_details table I want to get the single best (smallest) match from the netblock_details table.

I came up with 2 different queries that I think will return the accurate data, one using window functions and one using DISTINCT ON:

Click to copy

EXPLAIN SELECT DISTINCT ON (r.netblock) *
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                              QUERY PLAN
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=118452809778.47..118477166326.22 rows=581300 width=91)
   Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
   ->  Sort  (cost=118452809778.47..118464988052.34 rows=4871309551 width=91)
         Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         ->  Nested Loop  (cost=0.00..115920727265.53 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, r.netblock, (upper(n.range) - lower(n.range))
               Join Filter: (r.range <@ n.range)
               ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                     Output: r.asn, r.netblock, r.range
               ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                     Output: n.range, n.name, n.country
                     ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                           Output: n.range, n.name, n.country
(14 rows)               ->  Seq Scan on netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)


EXPLAIN VERBOSE SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY r.range ORDER BY UPPER(n.range) - LOWER(n.range)) AS rank
FROM routing_details r JOIN netblock_details n ON r.range <@ n.range
) a WHERE rank = 1 ORDER BY netblock;

                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=118620775630.16..118620836521.53 rows=24356548 width=99)
   Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
   Sort Key: a.netblock
   ->  Subquery Scan on a  (cost=118416274956.83..118611127338.87 rows=24356548 width=99)
         Output: a.asn, a.netblock, a.range, a.range_1, a.name, a.country, a.rank
         Filter: (a.rank = 1)
         ->  WindowAgg  (cost=118416274956.83..118550235969.49 rows=4871309551 width=91)
               Output: r.asn, r.netblock, r.range, n.range, n.name, n.country, row_number() OVER (?), ((upper(n.range) - lower(n.range))), r.range
               ->  Sort  (cost=118416274956.83..118428453230.71 rows=4871309551 width=91)
                     Output: ((upper(n.range) - lower(n.range))), r.range, r.asn, r.netblock, n.range, n.name, n.country
                     Sort Key: r.range, ((upper(n.range) - lower(n.range)))
                     ->  Nested Loop  (cost=0.00..115884192443.90 rows=4871309551 width=91)
                           Output: (upper(n.range) - lower(n.range)), r.range, r.asn, r.netblock, n.range, n.name, n.country
                           Join Filter: (r.range <@ n.range)
                           ->  Seq Scan on public.routing_details r  (cost=0.00..11458.96 rows=592496 width=43)
                                 Output: r.asn, r.netblock, r.range
                           ->  Materialize  (cost=0.00..277082.12 rows=8221675 width=48)
                                 Output: n.range, n.name, n.country
                                 ->  Seq Scan on public.netblock_details n  (cost=0.00..163712.75 rows=8221675 width=48)
                                       Output: n.range, n.name, n.country
(20 rows)

The DISTINCT ON seems slightly more efficient, so I've proceeded with that one. When I run the query against the full dataset I get an out of disk space error after around a 24h wait. I've created a routing_details_small table with a subset of N rows of the full routing_details table to try and understand what's going on.

With N=1000

Click to copy

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=4411888.68..4453012.20 rows=999 width=90) (actual time=124.094..133.720 rows=999 loops=1)
   ->  Sort  (cost=4411888.68..4432450.44 rows=8224705 width=90) (actual time=124.091..128.560 rows=4172 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external sort  Disk: 608kB
         ->  Nested Loop  (cost=0.41..1780498.29 rows=8224705 width=90) (actual time=0.080..101.518 rows=4172 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..20.00 rows=1000 width=42) (actual time=0.007..1.037 rows=1000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..1307.55 rows=41124 width=48) (actual time=0.063..0.089 rows=4 loops=1000)
                     Index Cond: (r.range <@ range)
 Total runtime: 134.999 ms
(9 rows)

With N=100000

Click to copy

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=654922588.98..659034941.48 rows=200 width=144) (actual time=28252.677..29487.380 rows=98992 loops=1)
   ->  Sort  (cost=654922588.98..656978765.23 rows=822470500 width=144) (actual time=28252.673..28926.703 rows=454856 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 64488kB
         ->  Nested Loop  (cost=0.41..119890431.75 rows=822470500 width=144) (actual time=0.079..24951.038 rows=454856 loops=1)
               ->  Seq Scan on routing_details_small r  (cost=0.00..1935.00 rows=100000 width=96) (actual time=0.007..110.457 rows=100000 loops=1)
               ->  Index Scan using idx_netblock_details_range on netblock_details n  (cost=0.41..725.96 rows=41124 width=48) (actual time=0.067..0.235 rows=5 loops=100000)
                     Index Cond: (r.range <@ range)
 Total runtime: 29596.667 ms
(9 rows)

With N=250000

Click to copy

=> EXPLAIN ANALYZE SELECT DISTINCT ON (r.netblock) *
-> FROM routing_details_small r JOIN netblock_details n ON r.range <@ n.range
-> ORDER BY r.netblock, upper(n.range) - lower(n.range);
                                                                                      QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1651822953.55..1662103834.80 rows=200 width=144) (actual time=185835.443..190143.266 rows=247655 loops=1)
   ->  Sort  (cost=1651822953.55..1656963394.18 rows=2056176250 width=144) (actual time=185835.439..188779.279 rows=1103850 loops=1)
         Sort Key: r.netblock, ((upper(n.range) - lower(n.range)))
         Sort Method: external merge  Disk: 155288kB
         ->  Nested Loop  (cost=0.28..300651962.46 rows=2056176250 width=144) (actual time=19.325..177403.913 rows=1103850 loops=1)
               ->  Seq Scan on netblock_details n  (cost=0.00..163743.05 rows=8224705 width=48) (actual time=0.007..8160.346 rows=8224705 loops=1)
               ->  Index Scan using idx_routing_details_small_range on routing_details_small r  (cost=0.28..22.16 rows=1250 width=96) (actual time=0.018..0.018 rows=0 loops=8224705)
                     Index Cond: (range <@ n.range)
 Total runtime: 190413.912 ms
(9 rows)

Against the full table with 600k rows the query fails after around 24h with an error about running out of disk space, which is presumably caused by the external merge step. So this query is working well and very quickly with a small routing_details table, but is scaling very poorly.

Suggestions for how to improve my query, or perhaps even schema changes I could make so that this query will work efficiently on the full dataset?

756

asked Jul 23 '15 10:07

Ben Dowling

1 Answers

I was thinking originally of a lateral join as in other suggested approaches (for example, the last query by Erwin Brandstetter, where he uses simple int8 datatype and simple indexes), but didn't want to write it in the answer, because I thought that it is not really efficient. When you try to find all netblock ranges that cover the given range, an index doesn't help much.

I'll repeat the Erwin Brandstetter's query here:

Click to copy

SELECT *  -- only select columns you need to make it faster
FROM   routing_details r
     , LATERAL (
   SELECT *
   FROM   netblock_details n
   WHERE  n.ip_min <= r.ip_min
   AND    n.ip_max >= r.ip_max
   ORDER  BY n.ip_max - n.ip_min
   LIMIT  1
   ) n;

When you have an index on netblock_details, like this:

Click to copy

CREATE INDEX netblock_details_ip_min_max_idx ON netblock_details 
(ip_min, ip_max DESC NULLS LAST);

you can quickly (in O(logN)) find the starting point of the scan in the netblock_details table - either the maximum n.ip_min that is less than r.ip_min, or the minimum n.ip_max that is more than r.ip_max. But then you have to scan/read the rest of the index/table and for each row do the second part of the check and filter out most rows.

In other words, this index helps to quickly find the starting row that satisfies first search criteria: n.ip_min <= r.ip_min, but then you'll continue reading all rows that satisfy this criteria and for each such row perform the second check n.ip_max >= r.ip_max. On average (if data has even distribution) you'll have to read half of the rows of the netblock_details table. Optimizer may choose to use index to search n.ip_max >= r.ip_max first and then apply second filter n.ip_min <= r.ip_min, but you can't use this index to apply both filters together.

End result: for each row from routing_details we'll read through half of rows from netblock_details. 600K * 4M = 2,400,000,000,000 rows. It is 2 times better than Cartesian product. You can see this number (Cartesian product) in the output of EXPLAIN ANALYZE in the question.

Click to copy

592,496 * 8,221,675 = 4,871,309,550,800

No wonder your queries run out of disk space when trying to materialize and sort this.

The overall high level process to get to the final result:

join two tables (without finding the best match yet). In the worst case it is Cartesian product, in the best case it is all covering ranges from netblock_details for each range from routing_details. You said that there are multiple entries in netblock_details for each entry in routing_details, anything from 3 to around 10. So, result of this join could have ~6M rows (not too much)
order/partition the result of the join by the routing_details ranges and for each such range find the best (smallest) covering range from netblock_details.

My idea is to reverse the query. Instead of finding all covering ranges from larger netblock_details for each row from smaller routing_details table I suggest to find all smaller ranges from smaller routing_details for each row from larger netblock_details.

Two step process

For each row from larger netblock_details find all ranges from routing_details that are inside the netblock range.

I would use the following schema in the queries. I've added primary key ID to both tables.

Click to copy

CREATE TABLE routing_details (
ID        int
,ip_min   int8
,ip_max   int8
,asn      text
,netblock text
);

CREATE TABLE netblock_details (
ID        int
,ip_min   int8
,ip_max   int8
,name     text
,country  text
,source   text
);

SELECT
    netblock_details.ID AS n_ID
    ,netblock_details.ip_max - netblock_details.ip_min AS n_length
    ,r.ID AS r_ID
FROM
    netblock_details
    INNER JOIN LATERAL
    (
        SELECT routing_details.ID
        FROM routing_details
        WHERE
            routing_details.ip_min >= netblock_details.ip_min
            AND routing_details.ip_min <= netblock_details.ip_max
            -- note how routing_details.ip_min is limited from both sides
            -- this would make it possible to scan only (hopefully) small
            -- portion of the table instead of full or half table
            AND routing_details.ip_max <= netblock_details.ip_max
            -- this clause ensures that the whole routing range
            -- is inside the netblock range
    ) AS r ON true

We need index on routing_details on (ip_min, ip_max). The main thing here is index on ip_min. Having second column in the index helps by eliminating the need to do the lookup for the value of ip_max; it doesn't help in the tree search.

Note that the lateral subquery doesn't have LIMIT. It is not the final result yet. This query should return ~6M rows. Save this result in a temporary table. Add an index to the temporary table on (r_ID, n_length, n_ID). n_ID is again just to remove extra lookups. We need an index do avoid sorting the data for each r_ID.

Final step

For each row from routing_details find the n_ID with the smallest n_length. Now we can use the lateral join in "proper" order. Here temp table is result of the previous step with the index.

Click to copy

SELECT
    routing_details.*
    ,t.n_ID
    ,netblock_details.*
FROM
    routing_details
    INNER JOIN LATERAL
    (
        SELECT temp.n_ID
        FROM temp
        WHERE temp.r_ID = routing_details.ID
        ORDER BY temp.n_length
        LIMIT 1
    ) AS t ON true
    INNER JOIN netblock_details ON netblock_details.ID = t.n_ID

Here subquery should be a seek of the index, not scan. Optimizer should be smart enough to do just the seek and return the first found result because of LIMIT 1, so you'll have 600K seeks of index in 6M row temp table.

Original answer (I'll keep it just for the diagram of ranges):

Can you "cheat"?

If I understood your query correctly, for each routing_details.range you want to find a smallest netblock_details.range that covers/is larger than routing_details.range.

Given the following example, where r is routing range and n1, ..., n8 are netblock ranges, the correct answer is n5.

Click to copy

   |---|
   n1

     |------------------|
     n2

                           |---------------|
                           n3

                                          |-----|
                                          n4

                  |------------------|
                  n5

                     |--------------------------------------|
                     n6

        |---------------------------|
        n7

                      |-----|
                      n8

                      |------------|
                      r
                     start       end

n.start <= r.start AND n.end >= r.end
order by n.length
limit 1

Your query that took 14 hours shows that index scan took 6ms, but sorting by range length took 80ms.

With this kind of search there is no simple 1D ordering of the data. You are using n.start together with n.end and together with n.length. But, maybe your data is not that generic, or it is OK to return a somewhat different result.

For example, if it was OK to return n6 as a result, it could work much faster. n6 is the covering range that has largest start:

Click to copy

n.start <= r.start AND n.end >= r.end
order by n.start desc
limit 1

Or, you could go for n7, which has smallest end:

Click to copy

n.start <= r.start AND n.end >= r.end
order by n.end
limit 1

This kind of search would use simple index on n.start (or n.end) without extra sorting.

A second, completely different approach. How big/long are the ranges? If they are relatively short (few numbers), then you could try to store them as an explicit list of integers, instead of a range. For example, range [5-8] would be stored as 4 rows: (5, 6, 7, 8). With this storage model it may be easier to find intersections of ranges.

104

answered Nov 15 '22 23:11

Vladimir Baranov

Related questions
                            
                                Can you use SQL to SELECT values from a JSON array?
                            
                                Justify text in SQL Reporting Services
                            
                                union clause in sql
                            
                                Can't create an index catalog in localdb v\11.0
                            
                                Database design for recurring events with exceptions
                            
                                Regression analysis in MySQL
                            
                                Dapper multi insert returning inserted objects
                            
                                Insert into a table and set another column to autoincremented column value
                            
                                Postgres concurrency and serializability. Do I need a SERIALIZABLE isolation level?
                            
                                Improving Postgres performance on graph-like queries of multi-level self-joins (comparison to Neo4j)
                            
                                Is querying over a view slower than executing SQL directly?
                            
                                Cannot insert explicit value for identity column in table 'ClientDetails' when IDENTITY_INSERT is set to OFF
                            
                                restart so that nextval is the next unused id
                            
                                Doing an update_all with joins in Rails
                            
                                In the Oracle JDBC driver, what happens to the time zone when you write a Java date to a TIMESTAMP column?
                            
                                Why does the Wordpress database schema not use foreign keys? [closed]
                            
                                Best practice to lock a record for editing while using entity framework
                            
                                MYSQL: How to make NULL or empty data default to 0 during insert
                            
                                SQL Database Best Practices - Use of Archive tables?
                            
                                Using two foreign keys as a primary key - MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joining 2 large postgres tables using int8range not scaling well

Tags:

sql

postgresql

amazon-web-services

amazon-rds

Ben Dowling

People also ask

1 Answers

Vladimir Baranov

Recent Activity

Donate For Us