A few posters have asked similar questions on here and these have taken me 80% of the way toward reading text files with sql queries in them into R to use as input to RODBC:
Import multiline SQL query to single string
RODBC Temporary Table Issue when connecting to MS SQL Server
However, my sql files have quite a few comments in them (as --comment on this and that). My question is, how would one go about either stripping comment lines from query on import, or making sure that the resulting string keeps line breaks, thus not appending actual queries to comments?
For example, query6.sql:
--query 6
select a6.column1,
a6.column2,
count(a6.column3) as counts
--count the number of occurences in table 1
from data.table a6
group by a6.column1
becomes:
sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"
when read into R.
Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:
> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
This works too:
> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.
read.sql <- function(filename, silent = TRUE) {
q <- readLines(filename, warn = !silent)
q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
q <- paste(q, collapse = " ")
return(q)
}
Function clean_query
:
require(tidyverse)
# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){
# if sql path, read, otherwise assume text input
if( str_detect(text_or_path, "(?i)\\.sql$") ){
text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")
}
# echo original query to the console
# (unnecessary, but helpful for status if passing sequential queries to a db)
cat("\nThe query you're processing is: \n", text_or_path, "\n\n")
# return
text_or_path %>%
# remove all demarked /* */ sql comments
gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>%
# remove all demarked -- comments
gsub(pattern = '--[^\r\n]*', replacement = ' ') %>%
# remove everything after the query-end semicolon
gsub(pattern = ';.*', replacement = ' ') %>%
#remove any line break, tab, etc.
gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%
# remove extra whitespace
gsub(pattern = ' +', replacement = ' ')
}
You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.
[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "
query <- "
/* this query has
intentionally messy
comments
*/
Select
COL_A -- with a comment here
,COL_B
,COL_C
FROM
-- and some helpful comment here
Database.Datatable
;
-- or wherever
/* and some more comments here */
"
Call function:
clean_query(query)
Output:
[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "
If you want to test reading from a .sql file:
temp_path <- path.expand("~/query.sql")
cat(query, file = temp_path)
clean_query(temp_path)
file.remove(temp_path)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With