I'm trying to convert this Scala expression to Java:
val corpus: RDD[String] = sc.wholeTextFiles("docs/*.md").map(_._2)
This is what I have in Java:
RDD<String> corpus = sc.wholeTextFiles("docs/*.md").map(a -> a._2);
But I get an error on a._2
:.
If I go to the "super" method, this is what I see:
package org.apache.spark.api.java.function;
import java.io.Serializable;
public interface Function<T1, R> extends Serializable {
R call(T1 var1) throws Exception;
}
In Scala PairRDD is a Tuple type and you can access its members with _1
and _2
. However Java does not have built in Tuples so you have to use methods to get these members. It should look like this, since Java always requires parentheses on any function.
JavaRDD<String> corpus = sc.wholeTextFiles("docs/*.md").map(a -> a._2());
Edit: It seems that in Scala an implicit parameter is passed to the map
method, which means you have to pass it explicitly in Java. See here for the Java Doc and here for the Scala documentation.
Edit 2: After a few hours of fumbling the answer was found, it had to be a JavaRDD.
You should be able to use values()
to get the result you want in Java here:
JavaRDD<String> corpus = sc.wholeTextFiles("docs/*.md").values();
Note that the type here is JavaRDD not RDD
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With