spark Type mismatch: cannot convert from JavaRDD

Question

I have started to write my Pyspark application to Java implementation. I am using Java 8. I just started to execute some of the basic spark progrma in java. I used the following wordcount example.

SparkConf conf = new SparkConf().setMaster("local").setAppName("Work Count App");

// Create a Java version of the Spark Context from the configuration
JavaSparkContext sc = new JavaSparkContext(conf);

JavaRDD<String> lines = sc.textFile(filename);

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")))
                    .mapToPair(word -> new Tuple2(word, 1))
                    .reduceByKey((x, y) -> (Integer) x + (Integer) y)
                    .sortByKey();

I am getting Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String> error in lines.flatMap(line -> Arrays.asList(line.split(" "))) When i googled, in all the Java 8 based spark example, i saw the same above implementation.What went wrong in my environemnt or the program.

Can some one help me ?

abaghel · Accepted Answer

Use this code. Actual issue is rdd.flatMap function expects Iterator<String> while your code is creating List<String>. Calling the iterator() will fix the problem.

JavaPairRDD<String, Integer> counts = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator())
            .mapToPair(word -> new Tuple2<String, Integer>(word, 1))
            .reduceByKey((x, y) ->  x +  y)
            .sortByKey();

counts.foreach(data -> {
        System.out.println(data._1()+"-"+data._2());
    });

spark Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String>

Tags:

java

java-8

apache-spark

backtrack

1 Answers

abaghel

Recent Activity

Donate For Us