I'm working on a fork of the RSQLServer
package and am trying to implement joins. With the current version of the package, joins for any DBI-connected database are implemented using sql_join.DBIConnection
. However, that implementation doesn't work well for SQL server. For instance, it makes use of USING
which is not supported by SQL server.
I've got a version of this function sql_join.SQLServerConnection
working (though not complete yet). I've based my function on sql_join.DBIConnection
as much as possible. One issue I've had is that sql_join.DBIConnection
calls a number of non-exported functions within dplyr
such as common_by
. For now, I've worked around this by using dplyr:::common_by
, but I'm aware that that's not ideal practice.
Should I:
dplyr
?:::
operator to call the functions?Clearly with option 3, there's a chance that the interface will change (since they're not exported functions) and that the package would break in the longer term.
Sample code:
sql_join.SQLServerConnection <- function (con, x, y, type = "inner", by = NULL, ...) {
join <- switch(type, left = sql("LEFT"), inner = sql("INNER"),
right = sql("RIGHT"), full = sql("FULL"), stop("Unknown join type:",
type, call. = FALSE))
by <- dplyr:::common_by(by, x, y)
using <- FALSE # all(by$x == by$y)
x_names <- dplyr:::auto_names(x$select)
y_names <- dplyr:::auto_names(y$select)
# more code
}
%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
select() and rename(): For choosing variables and using their names as a base for doing so.
The gather() function changes a wide data format into a long data format. This function is particularly helpful when using 'ggplot2' to get all of the values to plot into a single column.
It looks to me like you may not have to use those functions verbs. Since dplyr
now put it's database functionality in dbplyr
, the relevant code is here. I don't see the use of auto_names
or common_by
there.
I strongly recommend following the steps in Creating New Backends after reading SQL Translation.
It may also be worth reviewing some other alternative backends, such as Hrbrmaster's sergeant package for Apache Drill using JDBC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With