Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient lookup in a range table

Tags:

sql

mysql

ruby

I have a table of 1.6M IP ranges with organization names. The IP addresses are converted to integers. The table is in the form of:

enter image description here

I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.

I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running. The query I'm using is

select organization from iptable where ip_addr BETWEEN ip_start AND ip_end

Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.

like image 530
gitb Avatar asked Oct 09 '13 22:10

gitb


People also ask

Can you do a VLOOKUP with a range?

The range of cells in which the VLOOKUP will search for the lookup_value and the return value. You can use a named range or a table, and you can use names in the argument instead of cell references. The first column in the cell range must contain the lookup_value.

Is VLOOKUP faster than Xlookup?

Let's recap how XLOOKUP outperforms VLOOKUP and INDEX/MATCH: It is the simplest function, with only 3 arguments needed in most cases because the default match_mode is 0 (exact match). It's a single function, unlike INDEX/MATCH, so it's faster to type.

Which is more efficient VLOOKUP or INDEX match?

With sorted data and a fast technique to find an exact match, INDEX-MATCH is about 13% faster than VLOOKUP. Additionally, however, you can use a version of the INDEX-MATCH technique to calculate MUCH more quickly than with VLOOKUP.


1 Answers

Given that you already have an index on ip_start, this is how to use it best, assuming that you want to make one access per IP (1234 in this example):

select organization from (
    select ip_end, organization
    from iptable
    where ip_start <= 1234
    order by ip_start desc
    limit 1
) subqry where 1234 <= ip_end

This will use your index to start a scan which stops immediately because of the limit 1. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start and ip_end never overlap.

The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.

like image 156
Walter Tross Avatar answered Oct 12 '22 13:10

Walter Tross