Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Way to use glue_sql() and avoid paste in dynamic SELECT statement?

Tags:

sqlite

r

r-glue

I'm learning how to query SQLite dbs from R, and building those queries with glue_sql(). Below is a simplified example of a sub-query from my workflow. Is there a way I can create s10_wtX and s20_wtX without using paste0(), as in the code below?

library(DBI)
library(dplyr)
library(glue)

# example database
set.seed(1)
ps <- data.frame(plot = rep(1:3, each = 4),
                 spp = rep(1:3*10, 2),
                 wtX = rnorm(12, 10, 2) %>% round(1))
con <- dbConnect(RSQLite::SQLite(), "")
dbWriteTable(con, "ps", ps)

# species of interest
our_spp <- c(10, 20)

# for the spp of interest, sum wtX on each plot
sq <- glue_sql(paste0(
  'SELECT ps.plot,\n',
  paste0('SUM(CASE WHEN ps.spp = ', our_spp,
         ' THEN (ps.wtX) END) AS s', our_spp,
         '_wtX',
         collapse = ',\n'), '\n',
  '  FROM ps
    WHERE ps.spp IN ({our_spp*}) -- spp in our sample
    GROUP BY ps.plot'),
  .con = con)

# the result of the query should look like:
dbGetQuery(con, sq)
  plot s10_wtX s20_wtX
1    1    21.9    10.4
2    2    11.0    22.2
3    3     9.4    13.0

In my actual workflow, I have more than two species of interest, so I'd rather not fully write out each line (e.g., SUM(CASE WHEN ps.spp = 10 THEN (ps.wtX) END) AS s10_wtX).

like image 771
CzechInk Avatar asked May 08 '26 00:05

CzechInk


2 Answers

The OP's original question is

Is there a way I can create s10_wtX and s20_wtX without using paste0(), as in the code below?

If we want to construct only with glue, use glue_collapse as well

library(glue)
sq1 <- glue_sql('SELECT ps.plot,', glue_collapse(glue('SUM(CASE WHEN ps.spp = {our_spp} THEN (ps.wtX) END) AS s{our_spp}_wtX'), sep = ",\n"), '\nFROM ps\n WHERE ps.spp IN ({our_spp*}) -- spp in our sample\n    GROUP BY ps.plot', .con = con)
dbGetQuery(con, sq1)
  plot s10_wtX s20_wtX
1    1    21.9    10.4
2    2    11.0    22.2
3    3     9.4    13.0
like image 188
akrun Avatar answered May 09 '26 13:05

akrun


To formalize this a little (even if it is not what you ultimately use), here are my comments, in detail:

out <- DBI::dbGetQuery(con, "
  select ps.plot, ps.spp, sum(ps.wtX) as wtX
  from ps
  where ps.spp in (10,20)
  group by ps.plot, ps.spp")
out
#   plot spp  wtX
# 1    1  10 21.9
# 2    1  20 10.4
# 3    2  10 11.0
# 4    2  20 22.2
# 5    3  10  9.4
# 6    3  20 13.0

This can be easily pivoted to what you need. Using tidyr::pivot_wider, for instance,

tidyr::pivot_wider(out, plot, names_from="spp", values_from="wtX")
# # A tibble: 3 x 3
#    plot  `10`  `20`
#   <int> <dbl> <dbl>
# 1     1  21.9  10.4
# 2     2  11    22.2
# 3     3   9.4  13  

(The names will need to be cleaned up.)

like image 42
r2evans Avatar answered May 09 '26 13:05

r2evans