SparkR filterRDD and flatMap not working

Tags:

After spending a long time working out how to install SparkR I think there might be some issues with the package...

Please bear in mind I am very new to spark so am not sure if i have done the right thing or not.

From a fresh EC2 ubuntu 64 bit instance I've installed R and JDK

I git cloned the apache spark repo and built it with:

git clone https://github.com/apache/spark.git
cd spark
build/mvn -DskipTests -Psparkr package

I then changed my .Rprofile to reference the R directory by including the following lines....

Click to copy

Sys.setenv(SPARK_HOME="/home/ubuntu/spark")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

Then after starting R I try and run through the quick start guide given here

Below are the following steps I took...

Click to copy

 R> library(SparkR)
 R> sc <- sparkR.init(master="local")
 R> textFile <- SparkR:::textFile(sc, "/home/ubuntu/spark/README.md")
 R> cc <- SparkR:::count(textFile)
 R> t10 <- SparkR:::take(textFile,10)

All works fine till here... the below lines do not work...

Click to copy

 R> SparkR:::filterRDD(textFile, function(line){ grepl("Spark", line)})
 Error: class(objId) == "jobj" is not TRUE

 R> traceback()
 7: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), 
   ch), call. = FALSE, domain = NA)
 6: stopifnot(class(objId) == "jobj")
 5: callJMethod(object@jrdd, "toString")
 4: paste(callJMethod(object@jrdd, "toString"), "\n", sep = "")
 3: cat(paste(callJMethod(object@jrdd, "toString"), "\n", sep = ""))
 2: function (object) 
    standardGeneric("show")(x)
 1: function (object) 
    standardGeneric("show")(x)

Another example that doesn't work is below.

Click to copy

 R> SparkR:::flatMap(textFile,
         function(line) {
            strsplit(line, " ")[[1]]
               })
  Error: class(objId) == "jobj" is not TRUE

Below is my session info...

Click to copy

 R> > sessionInfo()
 R version 3.2.0 (2015-04-16)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 14.04.2 LTS

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base     

 other attached packages:
 [1] SparkR_1.4.0

Any help here would be greatly appreciated....

228

asked May 05 '15 15:05

h.l.m

1 Answers

So this is actually a bug in the show method of the RDD in SparkR and I've documented this at https://issues.apache.org/jira/browse/SPARK-7512

However this bug should not affect your computation in any way. So if you instead used

Click to copy

filteredRDD <- SparkR:::filterRDD(textFile, function(line){ grepl("Spark", line)})

then the error message should go away

176

answered Oct 20 '22 23:10

Shivaram Venkataraman

Related questions
                            
                                Is String.intern() thread safe
                            
                                Remove a single rule check from PMD in maven plugin
                            
                                System clear Property doesn't work. How can it be?
                            
                                How to manage project dependencies using Maven?
                            
                                Deep clone of Hibernate entity
                            
                                How to unpack an array into different arguments on method call
                            
                                Can I get a notification whenever the user interacts with an Android device?
                            
                                MessageBodyReader not found for media type=application/octet-stream
                            
                                How to create .docx files and .xlsx files on Android
                            
                                Spring Data JPA Specification to Select Specific Columns
                            
                                Joins in Java 8 Collection API
                            
                                sun.reflect.Reflection.getCallerClass alternative
                            
                                Spring MVC with hibernate Validator to validate single basic type
                            
                                Regular Expression for partial match length - String similarity
                            
                                Efficient BigInteger multiplication modulo n in Java
                            
                                How do I import a class in gradle outside of build.gradle in a apply from: file
                            
                                What is Manifest-Version in MANIFEST.MF?
                            
                                Using different pattern for a specific logger in logback
                            
                                Dagger 2 singletons not working
                            
                                What happens to the stack when exiting a method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SparkR filterRDD and flatMap not working

Tags:

java

r

amazon-ec2

scala

apache-spark

h.l.m

People also ask

1 Answers

Shivaram Venkataraman

Recent Activity

Donate For Us