I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL: <pre class="prettyprint"><code>CREATE TABLE ips ( id serial NOT NULL, begin_ip_num bigint, end_ip_num bigint, country_name character varying(255), CONSTRAINT ips_pkey PRIMARY KEY (id ) ) </code></pre> I've added plain B-tree indices on both <code>begin_ip_num</code> and <code>end_ip_num</code>: <pre class="prettyprint"><code>CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num); CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num ); </code></pre> The query being used is: <pre class="prettyprint"><code>SELECT ips.* FROM ips WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num; </code></pre> The problem is that my <code>BETWEEN</code> query is only using the index on <code>begin_ip_num</code>. After using the index, it filters the result using <code>end_ip_num</code>. Here's the <code>EXPLAIN ANALYZE</code> result: <pre class="prettyprint"><code>Index Scan using index_ips_on_begin_ip_num on ips (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1) Index Cond: (3065106743::bigint >= begin_ip_num) Filter: (3065106743::bigint <= end_ip_num) Rows Removed by Filter: 47596 Total runtime: 16.425 ms </code></pre> I've already tried various combinations of indices including adding a composite index on both <code>begin_ip_num</code> and <code>end_ip_num</code>.

I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here... First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusive at lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from. <pre class="prettyprint"><code>alter table ips add column iprange int8range; update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]'); create index index_ips_on_iprange on ips using gist (iprange); </code></pre> Having laid the groundwork, you can now use the '<@' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html <pre class="prettyprint"><code>SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange); </code></pre>

PostgreSQL index not used for query on IP ranges

Tags:

indexing

range

postgresql

database-design

gist-index

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

The query being used is:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

646

asked Jan 18 '13 21:01

Zain Zafar

3 Answers

I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here...

First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusive at lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from.

alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);

Having laid the groundwork, you can now use the '<@' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html

SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange);

192

answered Oct 08 '22 19:10

pbnelson

Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2. See:

Optimizing queries on a range of timestamps (two columns)

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

Can spatial index help a “range - order by - limit” query

answered Oct 08 '22 18:10

Erwin Brandstetter

I'm a bit late to this party, but this is what works really well for me.

Consider installing ip4r extension. It basically allows you to define a column that can hold IP ranges. The name of the extension implies it is just for IPv4, but currently it is also support IPv6.

After you populate table with ranges within that column all you need, is to create GIST index:

CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);

I have almost 10 million ranges in my database, but queries take fraction of a milisecond:

region=> select count(*) from ip_zip ;

  count  
---------
 9566133

region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1)
   Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1)
         Index Cond: ('8.8.8.8'::ip4r <<= ip4_range)
 Planning time: 0.106 ms
 Execution time: 0.118 ms
(7 rows)

region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1)
   Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1)
         Index Cond: ('254.50.22.54'::ip4r <<= ip4_range)
 Planning time: 0.102 ms
 Execution time: 0.145 ms
(7 rows)

answered Oct 08 '22 20:10

Derek

Related questions
                            
                                Group by on Postgresql Date Time
                            
                                PostgreSQL ilike with multiple matches in Rails ActiveRecord
                            
                                How to get index of an array value in PostgreSQL?
                            
                                How to connect from docker-compose to Host PostgreSQL?
                            
                                How to correctly provide password to PostgreSQL when connecting remotely in Windows?
                            
                                PostgreSQL Selecting Most Recent Entry for a Given ID
                            
                                What does “LANGUAGE 'plpgsql' VOLATILE” mean?
                            
                                Creating an admin user in production on Spree
                            
                                Sequelize grouping by date, disregarding hours/minutes/seconds
                            
                                How can I configure Hibernate with environment variable
                            
                                Test for null in function with varying parameters
                            
                                Call to undefined function pg_connect() - Wamp
                            
                                Insert multiple rows where not exists PostgresQL
                            
                                Is array all NULLs in PostgreSQL
                            
                                Upgrade Heroku Postgres DB plan
                            
                                Default datetime with Ecto & Elixir
                            
                                * Unrecognized field at: database Did you mean?: - metrics - server - logging - DROPWIZARD
                            
                                CREATE EXTENSION postgis fails,
                            
                                Postgres Function to Validate Email Address
                            
                                pg_dump: how to install on Amazon Linux for PostgreSQL 9.5.2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With