I'm trying to query multiple tables from a dataset in Big Query using dplyr and bigrquery. The dataset holds multiple tables, one for each day of data in a year. I can query from a single table (e.g., 1 day of data) with the following code but cant seem to make it work across multiple tables at once (e.g., for a month or year of data) . Any help would be greatly appreciated.
connection <- src_bigquery("my_project", "dataset1")
first_day <- connection %>%
tbl("20150101") %>%
select(field1) %>%
group_by(field1) %>%
summarise(number = n()) %>%
arrange(desc(number))
Thank you,
Juan
As far as I know there is no support for table wildcard functions
in dplyr
and bigrquery
at the moment. If you don't fear ugly hacks you can however extract and edit the query that dplyr
builds and sends to bq
so that it points to several tables instead of just one.
Set your billing information and connect to BigQuery:
my_billing <- ##########
bq_db <- src_bigquery(
project = "bigquery-public-data",
dataset = "noaa_gsod",
billing = my_billing
)
gsod <- tbl(bq_db, "gsod1929")
How to select from one table (just for comparison):
gsod %>%
filter(stn == "030750") %>%
select(year, mo, da, temp) %>%
collect
Source: local data frame [92 x 4]
year mo da temp
(chr) (chr) (chr) (dbl)
1 1929 10 01 45.2
2 1929 10 02 49.2
3 1929 10 03 48.2
4 1929 10 04 43.5
5 1929 10 05 42.0
6 1929 10 06 51.0
7 1929 10 07 48.0
8 1929 10 08 43.7
9 1929 10 09 45.1
10 1929 10 10 51.3
.. ... ... ... ...
How to select from multiple tables by manually editing the query generated by dplyr
:
multi_query <- gsod %>%
filter(stn == "030750") %>%
select(year, mo, da, temp) %>%
dplyr:::build_query(.)
multi_tables <- paste("[bigquery-public-data:noaa_gsod.gsod", c(1929, 1930), "]",
sep = "", collapse = ", ")
query_exec(
query = gsub("\\[gsod1929\\]", multi_tables, multi_query$sql),
project = my_billing
) %>% tbl_df
Source: local data frame [449 x 4]
year mo da temp
(chr) (chr) (chr) (dbl)
1 1930 06 11 51.8
2 1930 05 20 46.8
3 1930 05 21 48.5
4 1930 07 04 56.0
5 1930 08 08 54.5
6 1930 06 06 52.0
7 1930 01 14 36.8
8 1930 01 27 32.9
9 1930 02 08 35.6
10 1930 02 11 38.5
.. ... ... ... ...
Validation of the results:
table(.Last.value$year)
1929 1930
92 357
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With