BigQuery has NOAA's gsod data loaded as a public dataset - starting in 1929: https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/
How can I retrieve the historical data for any city?
To get started using a BigQuery public dataset, you must create or select a project. The first terabyte of data processed per month is free, so you can start querying public datasets without enabling billing. If you intend to go beyond the free tier, you must also enable billing. Sign in to your Google Cloud account.
You can query a table's historical data from any point in time within the time travel window by using a FOR SYSTEM_TIME AS OF clause. This clause takes a constant timestamp expression and references the version of the table that was current at that timestamp.
Subtracting a specific amount of days, weeks, months, quarters, or years from a date can be done using the DATE_SUB function. The first argument takes a date and the second argument takes an interval, a numeric value, and a unit. The supported units (DATE_PART) are: Days: DAY.
Update 2019: For convenience
SELECT *
FROM `fh-bigquery.weather_gsod.all`
WHERE name='SAN FRANCISCO INTERNATIONAL A'
ORDER BY date DESC
Updated daily - or report here if it doesn't
For example, to get the hottest days for San Francisco stations since 1980:
SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until
FROM `fh-bigquery.weather_gsod.all`
WHERE name LIKE 'SAN FRANC%'
AND date > '1980-01-01'
GROUP BY 1,2
ORDER BY active_until DESC
Note that this query processed only 28MB thanks to a clustered table.
And similar, but instead of using the station name I'll use a location and a table clustered by the location:
WITH city AS (SELECT ST_GEOGPOINT(-122.465, 37.807))
SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) station_until
FROM `fh-bigquery.weather_gsod.all_geoclustered`
WHERE EXTRACT(YEAR FROM date) > 1980
AND ST_DISTANCE(point_gis, (SELECT * FROM city)) < 40000
GROUP BY name, state
HAVING EXTRACT(YEAR FROM station_until)>2018
ORDER BY ST_DISTANCE(ANY_VALUE(point_gis), (SELECT * FROM city))
LIMIT 5
Update 2017: Standard SQL and up-to-date tables:
SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM `bigquery-public-data.noaa_gsod.gsod2016`
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day
Additional example, to show the coldest days in Chicago in this decade:
#standardSQL
SELECT year, FORMAT('%s%s',mo,da) day ,min
FROM `fh-bigquery.weather_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod201*` b
ON a.usaf=b.stn AND a.wban=b.wban
WHERE name='CHICAGO/O HARE ARPT'
AND min!=9999.9
AND mo<'03'
ORDER BY 1,2
To retrieve the historical weather for any city, first we need to find what station reports in that city. The table [fh-bigquery:weather_gsod.stations]
contains the name of known stations, their state (if in the US), country, and other details.
So to find all the stations in Austin, TX, we would use a query like this:
SELECT state, name, lat, lon
FROM [fh-bigquery:weather_gsod.stations]
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
LIMIT 10
This approach has 2 problems that need to be solved:
To solve the second problem, we need to join the stations table with the actual data we are looking for. The following query looks for stations around Austin, and the column c
looks at how many days during 2015 have actual data:
SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long
FROM [fh-bigquery:weather_gsod.gsod2015] a
JOIN [fh-bigquery:weather_gsod.stations] b
ON a.wban=b.wban
AND a.stn=b.usaf
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
GROUP BY 1,2
LIMIT 10
That's good! We found 4 stations with data for Austin during 2015.
Note that we had to treat "rain" in a special way: When a station doesn't monitor for rain, instead of null
, it marks it as 99.99. Our query filters those values out.
Now that we know the stn and wban numbers for these stations, we can pick any of them and visualize the results:
SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM [fh-bigquery:weather_gsod.gsod2015]
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day
There's now an official set of the NOAA data on BigQuery in addition to Felipe's "official" public dataset. There's a blog post describing it.
An example getting minimum temperatures for August 15, 2016:
SELECT
name,
value/10 AS min_temperature,
latitude,
longitude
FROM
[bigquery-public-data:ghcn_d.ghcnd_stations] AS stn
JOIN
[bigquery-public-data:ghcn_d.ghcnd_2016] AS wx
ON
wx.id = stn.id
WHERE
wx.element = 'TMIN'
AND wx.qflag IS NULL
AND STRING(wx.date) = '2016-08-15'
Which returns:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With