Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query exceeded resource limits in Bigquery

In order to find the country for a specific IP address, I'm using the Maxmind IP address database. I've downloaded the database and imported it into Google BigQuery, so I can query it. In a separate table, I'm capturing IP-addresses from certain events in our systems. I would now like to join those two data sources.

The columns in the Maxmind database are as follows:

  • start_ip_range STRING NULLABLE
  • end_ip_range STRING NULLABLE
  • start_ip_num STRING NULLABLE
  • end_ip_num STRING NULLABLE
  • country_code STRING NULLABLE
  • country_name

The columns in my event table are:

  • request_id STRING NULLABLE
  • ip_address STRING NULLABLE

As documented here (https://dev.maxmind.com/geoip/legacy/csv/) there is a way to get an integer representation of the ip address, so I can use it to query the ip address and retrieve the country_code or country_name.

I have now constructed the following query:

SELECT
  p.*,
  g.country_code AS country_code
FROM
  `dev.event_v1` p
INNER JOIN
  `dev.geo_ip_countries` g
ON
  SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(0)] AS NUMERIC)*16777216 + 
  SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(1)] AS NUMERIC)*65536 + 
  SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(2)] AS NUMERIC)*256 + 
  SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(3)] AS NUMERIC)
BETWEEN 
  SAFE_CAST(g.start_ip_num AS INT64)
AND 
  SAFE_CAST(g.end_ip_num AS INT64)
LIMIT 100

And however this works when using a limit, it doesn't for constructing a view.

Two questions: 1. Is there a way to simplify the query 2. Google BigQuery throws an error when I try to return a large result set:

Error: Query exceeded resource limits. 28099.974050246612 CPU seconds were used, and this query must use less than 5600.0 CPU seconds.

Any help is appreciated!

Solution Rewriting the query to the following worked and solved the resource limit issue too:

SELECT
  p.*,
  g.country_code
FROM
  `dev.event_v1` p
INNER JOIN
  `dev.geo_ip_countries` g
ON 
  NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(p.ip_address),16) = NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(g.start_ip_range),16)
WHERE 
  NET.SAFE_IP_FROM_STRING(p.ip_address) 
  BETWEEN 
    NET.SAFE_IP_FROM_STRING(g.start_ip_range) 
  AND 
    NET.SAFE_IP_FROM_STRING(g.end_ip_range)
like image 593
Patrick Plaatje Avatar asked Oct 17 '22 14:10

Patrick Plaatje


1 Answers

Try below (BigQuery Standard SQL)

#standardSQL
SELECT
  p.* EXCEPT(ip_address_num),
  g.country_code AS country_code
FROM (
  SELECT *, 
    SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(0)] AS NUMERIC)*16777216 + 
    SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(1)] AS NUMERIC)*65536 + 
    SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(2)] AS NUMERIC)*256 + 
    SAFE_CAST(SPLIT(p.ip_address, ".")[OFFSET(3)] AS NUMERIC) ip_address_num
  FROM `dev.event_v1` 
) p
INNER JOIN (
  SELECT 
    SAFE_CAST(g.start_ip_num AS INT64) start_ip_num, 
    SAFE_CAST(g.end_ip_num AS INT64) end_ip_num,
    country_code
  FROM `dev.geo_ip_countries`
) g
ON ip_address_num BETWEEN g.start_ip_num AND g.end_ip_num
like image 92
Mikhail Berlyant Avatar answered Nov 03 '22 07:11

Mikhail Berlyant