I have a table of 1.6M IP ranges with organization names. The IP addresses are converted to integers. The table is in the form of:
I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.
I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running. The query I'm using is
select organization from iptable where ip_addr BETWEEN ip_start AND ip_end
Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.
The range of cells in which the VLOOKUP will search for the lookup_value and the return value. You can use a named range or a table, and you can use names in the argument instead of cell references. The first column in the cell range must contain the lookup_value.
Let's recap how XLOOKUP outperforms VLOOKUP and INDEX/MATCH: It is the simplest function, with only 3 arguments needed in most cases because the default match_mode is 0 (exact match). It's a single function, unlike INDEX/MATCH, so it's faster to type.
With sorted data and a fast technique to find an exact match, INDEX-MATCH is about 13% faster than VLOOKUP. Additionally, however, you can use a version of the INDEX-MATCH technique to calculate MUCH more quickly than with VLOOKUP.
Given that you already have an index on ip_start
, this is how to use it best, assuming that you want to make one access per IP (1234
in this example):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1
. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start
and ip_end
never overlap.
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With