Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matrix Math With Sparklyr

Looking to convert some R code to Sparklyr, functions such as lmtest::coeftest() and sandwich::sandwich(). Trying to get started with Sparklyr extensions but pretty new to the Spark API and having issues :(

Running Spark 2.1.1 and sparklyr 0.5.5-9002

Feel the first step would be to make a DenseMatrix object using the linalg library:

library(sparklyr)
library(dplyr)
sc <- spark_connect("local")

rows <- as.integer(2)
cols <- as.integer(2)
array <- c(1,2,3,4)

mat <- invoke_new(sc, "org.apache.spark.mllib.linalg.DenseMatrix", 
                  rows, cols, array)

This results in the error:

Error: java.lang.Exception: No matched constructor found for class org.apache.spark.mllib.linalg.DenseMatrix

Okay, so I got a java lang exception, I'm pretty sure the rows and cols args were fine in the constructor, but not sure sure about the last one, which is supposed to be a java Array. So I tried a few permutations of:

array <- invoke_new(sc, "java.util.Arrays", c(1,2,3,4))

but end up with a similar error message...

Error: java.lang.Exception: No matched constructor found for class java.util.Arrays

I feel like I'm missing something pretty basic. Anyone know what's up?

like image 713
Zafar Avatar asked Jun 17 '17 06:06

Zafar


1 Answers

R counterpart of the Java Array is list:

invoke_new(
  sc, "org.apache.spark.ml.linalg.DenseMatrix",
  2L, 2L, list(1, 2, 3, 4))

## <jobj[17]>
##   class org.apache.spark.ml.linalg.DenseMatrix
##   1.0  3.0  
## 2.0  4.0  

or

invoke_static(
  sc, "org.apache.spark.ml.linalg.Matrices", "dense",
  2L, 2L, list(1, 2, 3, 4))

## <jobj[19]>
##   class org.apache.spark.ml.linalg.DenseMatrix
##   1.0  3.0  
## 2.0  4.0 

Please note I am using o.a.s.ml.linalg instead of o.a.s.mllib.linalg. While mllib would work in isolation, as of Spark 2.x o.a.s.ml algorithms no longer accept local o.a.s.mllib.

At the same time R vector types (numeric, integer, character) are used as scalars.

Note:

Personally I believe this is not the way to go. Spark linalg packages are quite limited, and internally depend on the libraries, which won't be usable via sparklyr. Moreover sparklyr API is not suitable for complex logic.

In practice it makes more sense to implement Java or Scala extension, with a thin, R friendly wrapper.

like image 127
zero323 Avatar answered Sep 21 '22 14:09

zero323