Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL index not used for query on IP ranges

I'm using PostgreSQL 9.2 and have a table of IP ranges. Here's the SQL:

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

I've added plain B-tree indices on both begin_ip_num and end_ip_num:

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

The query being used is:

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

The problem is that my BETWEEN query is only using the index on begin_ip_num. After using the index, it filters the result using end_ip_num. Here's the EXPLAIN ANALYZE result:

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

I've already tried various combinations of indices including adding a composite index on both begin_ip_num and end_ip_num.

like image 646
Zain Zafar Avatar asked Jan 18 '13 21:01

Zain Zafar


People also ask

Why is Postgres not using my index?

As we saw above, running a couple of queries on our posts table reveals that even given an index to use, Postgres will not always choose to use it. The reason why this is the case is that indexes have a cost to create and maintain (on writes) and use (on reads).

Does Postgres automatically use index?

PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table. The index covers the columns that make up the primary key or unique constraint (a multicolumn index, if appropriate), and is the mechanism that enforces the constraint.

Do I need to index foreign keys Postgres?

In contrast to the above, PostgreSQL requires no index at the source of a foreign key. However, such an index is quite useful for finding all source rows that reference a target row.


3 Answers

I had exactly this same problem on a nearly identical dataset from maxmind.com's free geiop table. I solved it using Erwin's tip about range types and GiST indexes. The GiST index was key. Without it I was querying at best about 3 rows per second. With it I queried nearly 500000 rows in under 10 seconds! Since Erwin didn't post detailed instructions on how to do this, I thought I'd add them, here...

First of all, you must add a new column having the range type, note that int8range is required for bigint types. Next set its values appropriately, note that the '[]' parameter indicates to make the range inclusive at lower and upper bounds (rtfm). Finally add the index, note that the GiST index is where all the performance advantage comes from.

alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);

Having laid the groundwork, you can now use the '<@' contained-by operator to search specific addresses against the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html

SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange);
like image 192
pbnelson Avatar answered Oct 08 '22 19:10

pbnelson


Try a multicolumn index, but with reversed order on the second column:

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

Ordering is mostly irrelevant for a single-column index, since it can be scanned backwards almost as fast. But it is important for multicolumn indexes.

With the index I propose, Postgres can scan the first column and find the address, where the rest of the index fulfills the first condition. Then it can, for each value of the first column, return all rows that fulfill the second condition, until the first one fails. Then jump to the next value of the first column, etc.
This is still not very efficient and Postgres may be faster just scanning the first index column and filtering for the second. Very much depends on your data distribution.

Either way, CLUSTER using the multicolumn index from above can help performance:

CLUSTER ips USING index_ips_begin_end_ip_num

This way, candidates fulfilling your first condition are packed onto the same or adjacent data pages. Can help performance a lot with if you have lots of rows per value of the first column. Else it is hardly effective.
(There are also non-blocking external tools for the purpose: pg_repack or pg_squeeze.)

Also, is autovacuum running and configured properly or have you run ANALYZE on the table? You need current statistics for Postgres to pick appropriate query plans.

What would really help here is a GiST index for a int8range column, available since PostgreSQL 9.2. See:

  • Optimizing queries on a range of timestamps (two columns)

If your IP ranges can be covered with one of the built-in network types inet or cidr, consider to replace your two bigint columns. Or, better yet, look to the additional module ip4r by Andrew Gierth (not in the standard distribution. The indexing strategy changes accordingly.

Barring that, you can check out this related answer on dba.SE with using a sophisticated regime with partial indexes. Advanced stuff, but it delivers great performance:

  • Can spatial index help a “range - order by - limit” query
like image 22
Erwin Brandstetter Avatar answered Oct 08 '22 18:10

Erwin Brandstetter


I'm a bit late to this party, but this is what works really well for me.

Consider installing ip4r extension. It basically allows you to define a column that can hold IP ranges. The name of the extension implies it is just for IPv4, but currently it is also support IPv6.

After you populate table with ranges within that column all you need, is to create GIST index:

CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);

I have almost 10 million ranges in my database, but queries take fraction of a milisecond:

region=> select count(*) from ip_zip ;

  count  
---------
 9566133

region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1)
   Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1)
         Index Cond: ('8.8.8.8'::ip4r <<= ip4_range)
 Planning time: 0.106 ms
 Execution time: 0.118 ms
(7 rows)

region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1)
   Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1)
         Index Cond: ('254.50.22.54'::ip4r <<= ip4_range)
 Planning time: 0.102 ms
 Execution time: 0.145 ms
(7 rows)
like image 38
Derek Avatar answered Oct 08 '22 20:10

Derek