Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dbplyr, dplyr, and functions with no SQL equivalents [eg `slice()`]

library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")

I can create this mock SQL database above. And it's very cool that I can perform standard dplyr functions on this "database":

mtcars2 %>% 
  group_by(cyl) %>% 
  summarise(mpg = mean(mpg, na.rm = TRUE)) %>% 
  arrange(desc(mpg))
#> # Source:     lazy query [?? x 2]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Ordered by: desc(mpg)
#>     cyl   mpg
#>   <dbl> <dbl>
#> 1     4  26.7
#> 2     6  19.7
#> 3     8  15.1

It appears I'm unable to use dplyr functions that have no direct SQL equivalents, (eg dplyr::slice()). In the case of slice() I can use the alternative combination of filter() and row_number() to get the same results as just using slice(). But what happens when there's not such an easy workaround?

mtcars2 %>% slice(1:5)
#>Error in UseMethod("slice_") : 
#>  no applicable method for 'slice_' applied to an object of class 
#>  "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

When dplyr functions have no direct SQL equivalents can I force their use with dbplyr, or is the only option to get creative with dplyr verbs that do have SQL equivalents, or just write the SQL directly (which is not my preferred solution)?

like image 294
Display name Avatar asked Dec 06 '19 17:12

Display name


1 Answers

I understood this question: How can I make slice() work for SQL databases? This is different from "forcing their use" but still might work in your case.

The example below shows how to implement a "poor man's" variant of slice() that works on the database. We still need to do the legwork and implement it with verbs that work on the database, but then we can use it similarly to data frames.

Read more about S3 classes in http://adv-r.had.co.nz/OO-essentials.html#s3.

library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")

# mtcars2 has a class attribute
class(mtcars2)
#> [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
#> [4] "tbl_lazy"             "tbl"

# slice() is an S3 method
slice
#> function(.data, ..., .preserve = FALSE) {
#>   UseMethod("slice")
#> }
#> <bytecode: 0x560a03460548>
#> <environment: namespace:dplyr>

# we can implement a "poor man's" variant of slice()
# for the particular class. (It doesn't work quite the same
# in all cases.)
#' @export
slice.tbl_sql <- function(.data, ...) {
  rows <- c(...)

  .data %>%
    mutate(...row_id = row_number()) %>%
    filter(...row_id %in% !!rows) %>%
    select(-...row_id)
}

mtcars2 %>%
  slice(1:5)
#> # Source:   lazy query [?? x 11]
#> # Database: sqlite 3.29.0 [:memory:]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1
#> 4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1
#> 5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2

Created on 2019-12-07 by the reprex package (v0.3.0)

like image 109
krlmlr Avatar answered Nov 18 '22 12:11

krlmlr