Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extending dplyr and use of internal functions

Tags:

sql

r

dplyr

r-s3

I'm working on a fork of the RSQLServer package and am trying to implement joins. With the current version of the package, joins for any DBI-connected database are implemented using sql_join.DBIConnection. However, that implementation doesn't work well for SQL server. For instance, it makes use of USING which is not supported by SQL server.

I've got a version of this function sql_join.SQLServerConnection working (though not complete yet). I've based my function on sql_join.DBIConnection as much as possible. One issue I've had is that sql_join.DBIConnection calls a number of non-exported functions within dplyr such as common_by. For now, I've worked around this by using dplyr:::common_by, but I'm aware that that's not ideal practice.

Should I:

  1. Ask Hadley Wickham/Romain Francois to export the relevant functions to make life easier for people developing packages that build on dplyr?
  2. Copy the internal functions into the package I'm working on?
  3. Continue to use the ::: operator to call the functions?
  4. Something else?

Clearly with option 3, there's a chance that the interface will change (since they're not exported functions) and that the package would break in the longer term.

Sample code:

sql_join.SQLServerConnection <- function (con, x, y, type = "inner", by = NULL, ...) {
  join <- switch(type, left = sql("LEFT"), inner = sql("INNER"), 
                 right = sql("RIGHT"), full = sql("FULL"), stop("Unknown join type:", 
                                                                type, call. = FALSE))
  by <- dplyr:::common_by(by, x, y)
  using <- FALSE # all(by$x == by$y)
  x_names <- dplyr:::auto_names(x$select)
  y_names <- dplyr:::auto_names(y$select)
# more code
}
like image 550
Nick Kennedy Avatar asked Jul 10 '15 15:07

Nick Kennedy


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What is dplyr and Tidyr?

dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.

Which of the following functions in dplyr package can be used to choose variables using their names?

select() and rename(): For choosing variables and using their names as a base for doing so.

Which Tidyverse package has functions for changing the structure of the Dataframe from a wide format to a long format?

The gather() function changes a wide data format into a long data format. This function is particularly helpful when using 'ggplot2' to get all of the values to plot into a single column.


1 Answers

It looks to me like you may not have to use those functions verbs. Since dplyr now put it's database functionality in dbplyr, the relevant code is here. I don't see the use of auto_names or common_by there.

I strongly recommend following the steps in Creating New Backends after reading SQL Translation.

It may also be worth reviewing some other alternative backends, such as Hrbrmaster's sergeant package for Apache Drill using JDBC.

like image 81
JBecker Avatar answered Oct 13 '22 13:10

JBecker