Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr to bind rows inside a database

I have two large tables, old_customers and new_customers and I want to row bind them. This is done but either one is too large to load in-memory. However, I'm not sure if bind_rows will allow binding of two tables without them being both data frames. I get this error when I try

old <- tbl(conn, 'old_customers')
new <- tbl(conn, 'new_customers')
old %>% bind_rows(new)
Error in bind_rows_(x, .id) : 
  Argument 1 must be a data frame or a named atomic vector, not a tbl_dbi/tbl_sql/tbl_lazy/tbl

The other option would be something like

old <- as.data.frame(tbl(conn, 'old_customers'))
new <- as.data.frame(tbl(conn, 'new_customers'))
old %>% bind_rows(new)

But again, I am trying to avoid loading this dataset into memory. Is there a way around this?

like image 866
Minh Mai Avatar asked Jan 29 '23 02:01

Minh Mai


1 Answers

You can use the union_all function, which operates very similarly to an rbind or bind_rows, but should have the added advantage of not necessitating the loading of data in memory.

old <- tbl(conn, 'old_customers')
new <- tbl(conn, 'new_customers')
old %>% union_all(new)
like image 54
bouncyball Avatar answered Feb 04 '23 22:02

bouncyball