Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark NotSerializableException

In my Spark code, I am attempting to create an IndexedRowMatrix from a csv file. However, I get the following error:

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
...
Caused by: java.io.NotSerializableException: org.apache.spark.api.java.JavaSparkContext

Here is my code:

sc = new JavaSparkContext("local", "App",
              "/srv/spark", new String[]{"target/App.jar"});

JavaRDD<String> csv = sc.textFile("data/matrix.csv").cache();


JavaRDD<IndexedRow> entries = csv.zipWithIndex().map(
              new  Function<scala.Tuple2<String, Long>, IndexedRow>() {
                /**
                 * 
                **/ 
                private static final long serialVersionUID = 4795273163954440089L;

                @Override
                public IndexedRow call(Tuple2<String, Long> tuple)
                        throws Exception {
                    String line = tuple._1;
                    long index = tuple._2;
                    String[] strings = line.split(",");
                    double[] doubles = new double[strings.length];
                     for (int i = 0; i < strings.length; i++) {
                         doubles[i] = Double.parseDouble(strings[i]);
                     }
                     Vector v = new DenseVector(doubles);
                     return new IndexedRow(index, v);
                }
            });
like image 321
user1330691 Avatar asked Jun 14 '15 11:06

user1330691


Video Answer


1 Answers

I had the same issue. It drove me around the twist. It is a Java restriction for anonymous instances and Serializability. My solution was to declare the anonymous instance of the Function as a named static class that implements Serializable and to instantiate it. I basically declared a functions library that was an outer class that included static inner class definitions of the functions I wanted to use.

Of course, if you write it in Scala, it will be one file most probably with much neater code, but that is not going to help you in this instance.

like image 125
Beezer Avatar answered Oct 20 '22 18:10

Beezer