Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`vec_arith` not called as expected

Tags:

r

vctrs

I lay a simple case below where I define a class "foo" over a double object, I want any arithmetic operation involving such object to strip it of its "foo" class and proceed normally.

I can partially make it work, but not robustly. see below :

library(vctrs)

x <- new_vctr(42, class = "foo")

# then this won't work (expected)
x * 2
#> Error: <foo> * <double> is not permitted

# define vec_arith method
vec_arith.foo <- function(op, x, y, ...) {
  print("we went there")
  # wrap x in vec_data to strip off the class, and forward to `vec_arith_base`
  vec_arith_base(op, vec_data(x), y)
}

# now this works  
x * 2
#> [1] "we went there"
#> [1] 84

# but this doesn't, and doesn't go through vec_arith.foo
x * data.frame(a=1)
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in x * data.frame(a = 1): non-numeric argument to binary operator

# while this works
42 * data.frame(a=1)
#>    a
#> 1 42

How can I make x * data.frame(a=1) return the same as 42 * data.frame(a=1)

traceback() doesn't return anything so I'm not sure how to debug this.

like image 285
Moody_Mudskipper Avatar asked Jan 10 '21 18:01

Moody_Mudskipper


People also ask

What is VEC in C++?

A contiguous growable array type, written as Vec<T> and pronounced ‘vector’. The vec! macro is provided to make initialization more convenient: It can also initialize each element of a Vec<T> with a given value.

How do I write a VEC<U8> vector?

If you’re fine with just getting a prefix of the Vec<T> , you can call .truncate (N) first. Write is implemented for Vec<u8> by appending to the vector.

What is an empty vector with a capacity of 10?

For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow.

Why use a VEC<T> in C++?

Use a Vec<T> as an efficient stack: The Vec type allows to access values by index, because it implements the Index trait. An example will be more explicit: However be careful: if you try to access an index which isn’t in the Vec , your software will panic! You cannot do this:


1 Answers

It is an intriguing question which caught my interest. I am no expert on this issue, but I found a way to get it working. It’s a rather dirty workaround and no real solution. There should be a better way to solve this issue using the {vctrs} package.

The problem is complicated, because we are dealing with an internal generic * which uses double dispatch (see here). The important part is that:

Generics in the Ops group, which includes the two-argument arithmetic and Boolean operators like - and &, implement a special type of method dispatch. They dispatch on the type of both of the arguments, which is called double dispatch.

It turns out that for a call like x * y R looks up both, this call and y * x. Then there are three possible outcomes:

The methods are the same, so it doesn’t matter which method is used.

The methods are different, and R falls back to the internal method with a warning.

One method is internal, in which case R calls the other method.

Lets keep this in mind when looking at the problem. I first refrained from using the {vctrs} package and tried to reconstruct the problem in two ways. First I tried to multiply an object of a new class with a list. This reproduces the error from the original example:

# lets create a new object
x1 <- 10
class(x1) <- "myclass"

# and multiply it with a list
l <- list(1)    
x1 * l 

# same error as in orignal example, but without warning
#> Error in x1 * l: non-numeric argument to binary operator

sloop::s3_dispatch(x1 * l)
#>    *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#> => * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

We can see with the {sloop} package that an internal generic is called. For this generic there exists no way to use * on lists. So let's try if we can overwrite this method:

`*.myclass` <- function(x, y) {
  print("myclass")
  if (is.list(y)) {
    print("if clause")
    y <- unlist(y)
  } else {
    print("didn't use if clause")
  }
  
    x + y # to see if it's working the operation is changed
}

x1 * l # now working
#> [1] "myclass"
#> [1] "if clause"
#> [1] 11
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * l)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(l * x1)
#>    *.list
#>    *.default
#>    Ops.list
#>    Ops.default
#> => * (internal)

This worked (although we really should not alter the objects in the methods call). Here we now have the third case described above: the methods are different, one is internal, so the non-internal method is called. Unlike data.frame's, list's have no existing method for arithmetic operations. So we would need an example where two objects of different class with different methods are multiplied.

# another object
y1 <- 20
class(y1) <- "another_class"

# here we still only have one method `*.myclass`:
x1 * y1 # working
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#>    *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#> => * (internal)

# lets introduce another method:    
`*.another_class` <- function(x, y) {
  x - y # again, to see if it is working we change the operation
}

# now we get (only) a warning, but with a different result!
x1 * y1 
#> Warning: Incompatible methods ("*.myclass", "*.another_class") for "*"
#> [1] 200
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

Here we now have the second case described above: the two methods are different, and R falls back to the internal method with a warning. This produces the "unaltered" result 20 * 10 = 200.

So regarding the original problem, my understanding is that we have two conflicting methods "*.vctrs_vctr" and "Ops.data.frame". For this reason, the internal method * (internal) is called, and this internal method does not allow lists or data.frames (this usually done inside Ops.data.frame which is not used, because of the conflicting methods).

library(vctrs)

z <- new_vctr(42, class = "foo")
a <- data.frame(a = 1)

z * a
#> Warning: Incompatible methods ("*.vctrs_vctr", "Ops.data.frame") for "*"
#> Error in z * a: non-numeric argument to binary operator

sloop::s3_dispatch(z * a)
#>    *.foo
#> => *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default 
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

Here again, we can see that two different methods exist and therefore, the internal method is used.

The dirty workaround I came up with, is to:

  1. create a non-internal generic *
  2. explicitly define *.foo and
  3. explictily define *.numeric which will be called once the objects are "unclassed" with vec_data().
`*` <- function(x, y) {
  UseMethod("*")
}

`*.foo` <- function(x, y) {
  op_fn <- getExportedValue("base", "*")
  op_fn(vec_data(x),vec_data(y))
}

`*.numeric` <- function(x, y) {
  print("numeric")
  fn <- getExportedValue("base", "*")
  fn(x, y)
}

z * a
#> [1] "numeric"
#>    a
#> 1 42

sloop::s3_dispatch(z * a)
#> => *.foo
#>  * *.vctrs_vctr
#>    *.default
#>    Ops.foo
#>    Ops.vctrs_vctr
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(a * z)
#>    *.data.frame
#>    *.default
#> => Ops.data.frame
#>    Ops.default
#>  * * (internal)

Created on 2021-01-13 by the reprex package (v0.3.0)

Unfortunately, I am not 100% sure, what is happing. It seems like overriding the * generic, also overrides the way R handles double dispatch for this generic. Let's revisit the multiplication of two different type of objects x1 * y1 above. Earlier, both methods were called, and since they were different a warning was issued and the internal method was chosen. Now we observe the following:

x1 * y1 # working without warning
#> [1] "myclass"
#> [1] "didn't use if clause"
#> [1] 30
#> attr(,"class")
#> [1] "myclass"

sloop::s3_dispatch(x1 * y1)
#> => *.myclass
#>    *.default
#>    Ops.myclass
#>    Ops.default
#>  * * (internal)

sloop::s3_dispatch(y1 * x1)
#> => *.another_class
#>    *.default
#>    Ops.another_class
#>    Ops.default
#>  * * (internal)

We have two conflicting methods, and still R choses the method of the first object, without issuing a warning.

This is of course not a real solution to the problem, for many reasons:

  1. Overriding the generics of arithmetic operations doesn't seem to be a good idea, since it is likely to break code.
  2. We would also need to deal with data.frame(a = 1) * z which still doesn't work (here we would need to override the existing code of Ops.data.frame.
  3. We shouldn't need to write methods for each arithmetic operation.

The {vctrs} package should help us to find a simpler and safer solution, and maybe it exists already. It might be worth opening an issue on Github.

like image 140
TimTeaFan Avatar answered Oct 28 '22 04:10

TimTeaFan