<h3>Background</h3> <p>The dispatch mechanism of the <code>R</code> functions <code>rbind()</code> and <code>cbind()</code> is non-standard. I explored some possibilities of writing <code>rbind.myclass()</code> or <code>cbind.myclass()</code> functions when one of the arguments is a <code>data.frame</code>, but so far I do not have a satisfactory approach. This post concentrates on <code>rbind</code>, but the same holds for <code>cbind</code>.</p> <h3>Problem</h3> <p>Let us create an <code>rbind.myclass()</code> function that simply echoes when it has been called.</p> <pre class="prettyprint"><code>rbind.myclass <- function(...) "hello from rbind.myclass" </code></pre> <p>We create an object of class <code>myclass</code>, and the following calls to <code>rbind</code> all properly dispatch to <code>rbind.myclass()</code></p> <pre class="prettyprint"><code>a <- "abc" class(a) <- "myclass" rbind(a, a) rbind(a, "d") rbind(a, 1) rbind(a, list()) rbind(a, matrix()) </code></pre> <p>However, when one of the arguments (this need not be the first one), <code>rbind()</code> will call <code>base::rbind.data.frame()</code> instead:</p> <pre class="prettyprint"><code>rbind(a, data.frame()) </code></pre> <p>This behavior is a little surprising, but it is actually documented in the <code>dispatch</code> section of <code>rbind()</code>. The advice given there is:</p> <blockquote> <p>If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first.</p> </blockquote> <p>In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command <code>rbind(a, x)</code>.</p> <h3>Approaches</h3> <h3>Warn the user</h3> <p>A first possibility is to warn the user that the call to <code>rbind(a, x)</code> should not be made when <code>x</code> is a data frame. Instead, the user of package <code>mypackage</code> should make an explicit call to a hidden function: </p> <pre class="prettyprint"><code>mypackage:::rbind.myclass(a, x) </code></pre> <p>This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy. </p> <h3>Intercept <code>rbind</code> </h3> <p>Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of <code>base::rbind.data.frame()</code>:</p> <pre class="prettyprint"><code>rbind.data.frame <- function(...) "hello from my rbind.data.frame" rbind(a, data.frame()) rm(rbind.data.frame) </code></pre> <p>This fails as <code>rbind()</code> is not fooled in calling <code>rbind.data.frame</code> from the <code>.GlobalEnv</code>, and calls the <code>base</code> version as usual.</p> <p>Another strategy is to override <code>rbind()</code> by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`. </p> <pre class="prettyprint"><code>rbind <- function (...) { if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...)) else return(base::rbind(...)) } </code></pre> <p>This works perfectly for dispatching to <code>rbind.myclass()</code>, so the user can now type <code>rbind(a, x)</code> for any type of object <code>x</code>. </p> <pre class="prettyprint"><code>rbind(a, data.frame()) </code></pre> <p>The downside is that after <code>library(mypackage)</code> we get the message <code>The following objects are masked from ‘package:base’: rbind</code> .</p> <p>While technically everything works as expected, there should be better ways than a <code>base</code> function override.</p> <h3>Conclusion</h3> <p>None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?</p>

<p>As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.</p> <p>Actually even two different ways:<br> 1) <code>Rmpfr</code> uses the new way to define methods for the '...' in rbind()/cbind(). this is well documented in <code>?dotsMethods</code> (mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)</p> <p>2) <code>Matrix</code> uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read <code>?rbind</code> it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for <em>two</em> of its potentially many arguments recursively.</p>

Dispatch of `rbind` and `cbind` for a `data.frame`

Background

The dispatch mechanism of the R functions rbind() and cbind() is non-standard. I explored some possibilities of writing rbind.myclass() or cbind.myclass() functions when one of the arguments is a data.frame, but so far I do not have a satisfactory approach. This post concentrates on rbind, but the same holds for cbind.

Problem

Let us create an rbind.myclass() function that simply echoes when it has been called.

rbind.myclass <- function(...) "hello from rbind.myclass"

We create an object of class myclass, and the following calls to rbind all properly dispatch to rbind.myclass()

a <- "abc"
class(a) <- "myclass"
rbind(a, a)
rbind(a, "d")
rbind(a, 1)
rbind(a, list())
rbind(a, matrix())

However, when one of the arguments (this need not be the first one), rbind() will call base::rbind.data.frame() instead:

rbind(a, data.frame())

This behavior is a little surprising, but it is actually documented in the dispatch section of rbind(). The advice given there is:

If you want to combine other objects with data frames, it may be necessary to coerce them to data frames first.

In practice, this advice may be difficult to implement. Conversion to a data frame may remove essential class information. Moreover, the user who might be unware of the advice may be stuck with an error or an unexpected result after issuing the command rbind(a, x).

Approaches

Warn the user

A first possibility is to warn the user that the call to rbind(a, x) should not be made when x is a data frame. Instead, the user of package mypackage should make an explicit call to a hidden function:

mypackage:::rbind.myclass(a, x)

This can be done, but the user has to remember to make the explicit call when needed. Calling the hidden function is something of a last resort, and should not be regular policy.

Intercept `rbind`

Alternatively, I tried to shield the user by intercepting dispatch. My first try was to provide a local definition of base::rbind.data.frame():

rbind.data.frame <- function(...) "hello from my rbind.data.frame"
rbind(a, data.frame())
rm(rbind.data.frame)

This fails as rbind() is not fooled in calling rbind.data.frame from the .GlobalEnv, and calls the base version as usual.

Another strategy is to override rbind() by a local function, which was suggested in S3 dispatching of `rbind` and `cbind`.

rbind <- function (...) {
  if (attr(list(...)[[1]], "class") == "myclass") return(rbind.myclass(...))
  else return(base::rbind(...))
}

This works perfectly for dispatching to rbind.myclass(), so the user can now type rbind(a, x) for any type of object x.

rbind(a, data.frame())

The downside is that after library(mypackage) we get the message The following objects are masked from ‘package:base’: rbind .

While technically everything works as expected, there should be better ways than a base function override.

Conclusion

None of the above alternatives is satisfactory. I have read about alternatives using S4 dispatch, but so far I have not located any implementations of the idea. Any help or pointers?

575

asked Dec 25 '17 09:12

Stef van Buuren

1 Answers

As you mention yourself, using S4 would be one good solution that works nicely. I have not investigated recently, with data frames as I am much more interested in other generalized matrices, in both of my long time CRAN packages 'Matrix' (="recommended", i.e. part of every R distribution) and in 'Rmpfr'.

Actually even two different ways:
1) Rmpfr uses the new way to define methods for the '...' in rbind()/cbind(). this is well documented in ?dotsMethods (mnemonic: '...' = dots) and implemented in Rmpfr/R/array.R line 511 ff (e.g. https://r-forge.r-project.org/scm/viewvc.php/pkg/R/array.R?view=annotate&root=rmpfr)

2) Matrix uses the older approach by defining (S4) methods for rbind2() and cbind2(): If you read ?rbind it does mention that and when rbind2/cbind2 are used. The idea there: "2" means you define S4 methods with a signature for two ("2") matrix-like objects and rbind/cbind uses them for two of its potentially many arguments recursively.

191

answered Sep 19 '22 18:09

Martin Mächler

Related questions
                            
                                How to use data.table as super class in S4
                            
                                3-D Cartesian points to 2-D hemispherical and calculate the area of 2-D Voronoi cells
                            
                                Update R version from 3.0.3 to 3.1.2
                            
                                R: Tukey posthoc tests for nnet multinom multinomial fit to test for overall differences in multinomial distribution
                            
                                R: Maps with Time Slider?
                            
                                Efficient weighted covariances in RcppEigen
                            
                                Are there any R package repository management tools?
                            
                                Is there a Python equivalent to the smooth.spline function in R
                            
                                Gitignore man directory of an R package?
                            
                                Incorporating observation weights in the randomForest package
                            
                                Unit Testing Shiny Apps
                            
                                Stitch images and data from coupled microscopy/spectroscopy into panoramic in Photoshop or R
                            
                                Reorder() not correctly reordering a factor variable in ggplot
                            
                                RStudio project and git repository in subdirectory
                            
                                Time in getting single elements from data.table and data.frame objects
                            
                                What are the restrictions on seq.int?
                            
                                Is there a datatype "Decimal" in R?
                            
                                Replace parts of a variable using numeric indices in dplyr. Do I need to create an index column and use ifelse?
                            
                                Assignment to replace value in nonlocal list
                            
                                Make the `drop` argument in `dcast` only look at the RHS of the formula

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dispatch of `rbind` and `cbind` for a `data.frame`

Tags:

r

dispatch

rbind

cbind

Background

Problem

Approaches

Warn the user

Intercept `rbind`

Conclusion

Stef van Buuren

People also ask

1 Answers

Martin Mächler

Recent Activity

Donate For Us

Dispatch of `rbind` and `cbind` for a `data.frame`

Tags:

r

dispatch

rbind

cbind

Background

Problem

Approaches

Warn the user

Intercept rbind

Conclusion

Stef van Buuren

People also ask

1 Answers

Martin Mächler

Related questions

Recent Activity

Donate For Us

Intercept `rbind`