Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HAVERSINE distance in BigQuery?

I'm looking for a way to get HAVERSINE() in BigQuery. For example, how to get the closest weather stations to an arbitrary point?

like image 265
Felipe Hoffa Avatar asked Feb 06 '17 13:02

Felipe Hoffa


People also ask

Is Haversine distance accurate?

Consequently, the Haversine formula can result in an error of up to 0.5%. To address this, Thaddeus Vincenty developed a very complicated formula that is accurate up to 0.5mm, making it the ultimate geodesic formula for all serious scientific purposes.

Is Haversine distance a metric?

The documentation says,"Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.",so I should be able to convert to km multiplying by 6371 (great distance approx for radius). This is the correct distance.


2 Answers

2019 update: BigQuery now has a native ST_DISTANCE() function, which is more accurate than Haversine.

For example:

#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
  ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
  111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
                               lat2 FLOAT64, long2 FLOAT64) AS (
  RADIANS_TO_KM(
    ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
         COS(RADIANS(long1) - RADIANS(long2)) +
         SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);

SELECT
  lat,
  lon,
  name,
  HAVERSINE(40.73943, -73.99585, lat, lon) *1000 AS haversine_distance
  , ST_DISTANCE(
      ST_GEOGPOINT(-73.99585, 40.73943)
      , ST_GEOGPOINT(lon,lat)) bqgis_distance
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY 1 DESC
LIMIT 4;

enter image description here


Using standard SQL you can define a SQL function to encapsulate the logic. For example,

#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
  ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
  111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
                               lat2 FLOAT64, long2 FLOAT64) AS (
  RADIANS_TO_KM(
    ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
         COS(RADIANS(long1) - RADIANS(long2)) +
         SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);

SELECT
  lat,
  lon,
  name,
  HAVERSINE(40.73943, -73.99585, lat, lon) AS distance_in_km
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY distance_in_km
LIMIT 4;
like image 67
Elliott Brossard Avatar answered Oct 21 '22 17:10

Elliott Brossard


2018 update: BigQuery now supports native geo functions.

ST_DISTANCE: Returns the shortest distance in meters between two non-empty GEOGRAPHYs.

Distance between NY and Seattle:

#standardSQL
WITH geopoints AS (
  SELECT ST_GEOGPOINT(lon,lat) p, name, state
  FROM `bigquery-public-data.noaa_gsod.stations`  
)

SELECT ST_DISTANCE(
  (SELECT p FROM geopoints WHERE name='PORT AUTH DOWNTN MANHATTAN WA'),
  (SELECT p FROM geopoints WHERE name='SEATTLE')
)

3866381.55
  • https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions

Legacy SQL solution (standard pending):

SELECT lat, lon, name,
  (111.045 * DEGREES(ACOS(COS(RADIANS(40.73943)) * COS(RADIANS(lat)) * COS(RADIANS(-73.99585) - RADIANS(lon)) + SIN(RADIANS(40.73943)) * SIN(RADIANS(lat))))) AS distance
FROM [bigquery-public-data:noaa_gsod.stations]
HAVING distance>0
ORDER BY distance
LIMIT 4

enter image description here

(based on http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/)

like image 25
Felipe Hoffa Avatar answered Oct 21 '22 15:10

Felipe Hoffa