I'm looking for a way to get HAVERSINE() in BigQuery. For example, how to get the closest weather stations to an arbitrary point?
Consequently, the Haversine formula can result in an error of up to 0.5%. To address this, Thaddeus Vincenty developed a very complicated formula that is accurate up to 0.5mm, making it the ultimate geodesic formula for all serious scientific purposes.
The documentation says,"Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.",so I should be able to convert to km multiplying by 6371 (great distance approx for radius). This is the correct distance.
2019 update: BigQuery now has a native ST_DISTANCE()
function, which is more accurate than Haversine.
For example:
#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
lat2 FLOAT64, long2 FLOAT64) AS (
RADIANS_TO_KM(
ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
COS(RADIANS(long1) - RADIANS(long2)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);
SELECT
lat,
lon,
name,
HAVERSINE(40.73943, -73.99585, lat, lon) *1000 AS haversine_distance
, ST_DISTANCE(
ST_GEOGPOINT(-73.99585, 40.73943)
, ST_GEOGPOINT(lon,lat)) bqgis_distance
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY 1 DESC
LIMIT 4;
Using standard SQL you can define a SQL function to encapsulate the logic. For example,
#standardSQL
CREATE TEMP FUNCTION RADIANS(x FLOAT64) AS (
ACOS(-1) * x / 180
);
CREATE TEMP FUNCTION RADIANS_TO_KM(x FLOAT64) AS (
111.045 * 180 * x / ACOS(-1)
);
CREATE TEMP FUNCTION HAVERSINE(lat1 FLOAT64, long1 FLOAT64,
lat2 FLOAT64, long2 FLOAT64) AS (
RADIANS_TO_KM(
ACOS(COS(RADIANS(lat1)) * COS(RADIANS(lat2)) *
COS(RADIANS(long1) - RADIANS(long2)) +
SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))))
);
SELECT
lat,
lon,
name,
HAVERSINE(40.73943, -73.99585, lat, lon) AS distance_in_km
FROM `bigquery-public-data.noaa_gsod.stations`
WHERE lat IS NOT NULL AND lon IS NOT NULL
ORDER BY distance_in_km
LIMIT 4;
2018 update: BigQuery now supports native geo functions.
ST_DISTANCE: Returns the shortest distance in meters between two non-empty GEOGRAPHYs.
Distance between NY and Seattle:
#standardSQL
WITH geopoints AS (
SELECT ST_GEOGPOINT(lon,lat) p, name, state
FROM `bigquery-public-data.noaa_gsod.stations`
)
SELECT ST_DISTANCE(
(SELECT p FROM geopoints WHERE name='PORT AUTH DOWNTN MANHATTAN WA'),
(SELECT p FROM geopoints WHERE name='SEATTLE')
)
3866381.55
Legacy SQL solution (standard pending):
SELECT lat, lon, name,
(111.045 * DEGREES(ACOS(COS(RADIANS(40.73943)) * COS(RADIANS(lat)) * COS(RADIANS(-73.99585) - RADIANS(lon)) + SIN(RADIANS(40.73943)) * SIN(RADIANS(lat))))) AS distance
FROM [bigquery-public-data:noaa_gsod.stations]
HAVING distance>0
ORDER BY distance
LIMIT 4
(based on http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With