Is there a way to use bind_rows()
on a set of data frames without first collecting them from the database?
Say I've defined a couple dplyr query tables:
mydatabase <- src_mysql('database')
table1 <- tbl(mydatabase,"table1")
table2 <- tbl(mydatabase,"table3")
foo <- table1 %>% filter(id > 10) %>% select(id)
bar <- table2 %>% select(id)
I'd like to be able to join foo and bar together--in essence, I'd like to perform a union on the two subqueries without having to drop to SQL. However, when I try that, I get an error because I'm trying to join two tbl_sql objects, rather that real data frames:
unioned_data_frame <- bind_rows(foo,bar)
Error: incompatible sizes (1 != 8)
Any suggestions? In this toy example, writing the whole query in SQL wouldn't be a problem, but of course, in real life, foo and bar are often significantly more complicated.
Using dplyr::union()
will do the SQL union()
action, although it's important to note that that dplyr::union()
will remove duplicate rows (like the SQL version). Using dplyr::union_all()
keeps duplicate rows like bind_rows()
.
Unfortunately, there isn't a way to get benefits of bind_rows()
, particularly the very useful .id
argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With