The dispatch mechanism of the R
functions rbind()
and cbind()
is non-standard. I explored some possibilities of writing rbind.myclass()
or cbind.myclass()
functions when one of the arguments is a data.frame
, but so far I do not have a satisfactory approach. This post concentrates on rbind
, but the same holds for cbind
.
Let us create an rbind.myclass()
function that simply echoes when it has been called.
rbind.myclass <- function(...) "hello from rbind.myclass"
We create an object of class myclass
, and the following calls to rbind
all
properly dispatch to rbind.myclass()
a <- "abc"
class(a) <- "myclass"
rbind(a, a)
rbind(a, "d")
rbind(a, 1)
rbind(a, list())
rbind(a, matrix())
However, when one of the arguments (this need not be the first one), rbind()
will call base::rbind.data.frame()
instead:
rbind(a, data.frame())
This behavior is a little surprising, but it is actually documented in the
dispatch
section of rbind()
. The advice given there is:
If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first.
In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command rbind(a, x)
.
A first possibility is to warn the user that the call to rbind(a, x)
should not be made when x
is a data frame. Instead, the user of package mypackage
should make an explicit call to a hidden function:
mypackage:::rbind.myclass(a, x)
This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy.
rbind
Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of base::rbind.data.frame()
:
rbind.data.frame <- function(...) "hello from my rbind.data.frame"
rbind(a, data.frame())
rm(rbind.data.frame)
This fails as rbind()
is not fooled in calling rbind.data.frame
from the .GlobalEnv
, and calls the base
version as usual.
Another strategy is to override rbind()
by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`.
rbind <- function (...) {
if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...))
else return(base::rbind(...))
}
This works perfectly for dispatching to rbind.myclass()
, so the user can now type rbind(a, x)
for any type of object x
.
rbind(a, data.frame())
The downside is that after library(mypackage)
we get the message The following objects are masked from ‘package:base’: rbind
.
While technically everything works as expected, there should be better ways than a base
function override.
None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?
cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
The cbind function is used to combine vectors, matrices and/or data frames by columns.
The function rbind() is slow, particularly as the data frame gets bigger. You should never use it in a loop. The right way to do it is to initialize the output object at its final size right from the start and then simply fill it in with each turn of the loop.
As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.
Actually even two different ways:
1) Rmpfr
uses the new way to define methods for the '...' in rbind()/cbind().
this is well documented in ?dotsMethods
(mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)
2) Matrix
uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read ?rbind
it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for two of its potentially many arguments recursively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With