Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query with comments import into R from file

Tags:

sql

r

A few posters have asked similar questions on here and these have taken me 80% of the way toward reading text files with sql queries in them into R to use as input to RODBC:

Import multiline SQL query to single string

RODBC Temporary Table Issue when connecting to MS SQL Server

However, my sql files have quite a few comments in them (as --comment on this and that). My question is, how would one go about either stripping comment lines from query on import, or making sure that the resulting string keeps line breaks, thus not appending actual queries to comments?

For example, query6.sql:

--query 6
select a6.column1, 
    a6.column2,
    count(a6.column3) as counts
--count the number of occurences in table 1 
from data.table a6
group by a6.column1

becomes:

sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr 
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"

when read into R.

like image 525
carnust Avatar asked Dec 17 '13 11:12

carnust


3 Answers

Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:

> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

This works too:

> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8
like image 65
G. Grothendieck Avatar answered Nov 01 '22 07:11

G. Grothendieck


I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.

read.sql <- function(filename, silent = TRUE) {
    q <- readLines(filename, warn = !silent)
    q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
    q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
    q <- paste(q, collapse = " ")
    return(q)
}
like image 2
Gregor Thomas Avatar answered Nov 01 '22 05:11

Gregor Thomas


Summary

Function clean_query:

  • Removes all mixed comments
  • Creates single string output
  • Takes a SQL path or text string
  • Is simple

Function

require(tidyverse)

# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){


  # if sql path, read, otherwise assume text input
  if( str_detect(text_or_path, "(?i)\\.sql$") ){

    text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")

  }


  # echo original query to the console 
  #  (unnecessary, but helpful for status if passing sequential queries to a db)
  cat("\nThe query you're processing is: \n", text_or_path, "\n\n")


  # return
  text_or_path %>% 
    # remove all demarked /*  */ sql comments 
    gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>% 
    # remove all demarked -- comments 
    gsub(pattern = '--[^\r\n]*', replacement = ' ') %>% 
    # remove everything after the query-end semicolon 
    gsub(pattern = ';.*', replacement = ' ') %>% 
    #remove any line break, tab, etc.
    gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%  
    # remove extra whitespace 
    gsub(pattern = ' +', replacement = ' ') 

}

You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.



Output for "query6.sql"

[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "



Additional Text Input Example

query <- "

    /* this query has 
    intentionally messy 
    comments
    */

    Select 
       COL_A -- with a comment here
      ,COL_B
      ,COL_C
    FROM 
      -- and some helpful comment here
      Database.Datatable
    ;
    -- or wherever

    /* and some more comments here */

"

Call function:

clean_query(query)

Output:

[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "



If you want to test reading from a .sql file:

temp_path <- path.expand("~/query.sql")

cat(query, file = temp_path)

clean_query(temp_path)

file.remove(temp_path)
like image 2
Tori Oblad Avatar answered Nov 01 '22 07:11

Tori Oblad