Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why apache spark does not work with java 10? We get illegal reflective then java.lang.IllegalArgumentException

Tags:

Is there any technical reason why spark 2.3 does not work with java 1.10 (as of July 2018)?

Here is the output when I run SparkPi example using spark-submit.

$ ./bin/spark-submit ./examples/src/main/python/pi.py 
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-07-13 14:31:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-13 14:31:31 INFO  SparkContext:54 - Running Spark version 2.3.1
2018-07-13 14:31:31 INFO  SparkContext:54 - Submitted application: PythonPi
2018-07-13 14:31:31 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 58681.
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering MapOutputTracker
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering BlockManagerMaster
2018-07-13 14:31:31 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2018-07-13 14:31:31 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2018-07-13 14:31:31 INFO  DiskBlockManager:54 - Created local directory at /private/var/folders/mp/9hp4l4md4dqgmgyv7g58gbq0ks62rk/T/blockmgr-d24fab4c-c858-4cd8-9b6a-97b02aa630a5
2018-07-13 14:31:31 INFO  MemoryStore:54 - MemoryStore started with capacity 434.4 MB
2018-07-13 14:31:31 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
...
2018-07-13 14:31:32 INFO  StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
Traceback (most recent call last):
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/./examples/src/main/python/pi.py", line 44, in <module>
    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 862, in reduce
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in collect
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "~/Documents/spark-2.3.1-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
    at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:938)
    at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:162)
    at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:844)

2018-07-13 14:31:33 INFO  SparkContext:54 - Invoking stop() from shutdown hook
...

I resolved the issue by switching to Java8 instead of Java10 as mentioned here.

like image 682
mehdi Avatar asked Jul 13 '18 18:07

mehdi


People also ask

Does Spark work with JDK 11?

Spark runs on Java 8/11, Scala 2.12, Python 2.7+/3.4+ and R 3.1+. Java 8 prior to version 8u92 support is deprecated as of Spark 3.0.

Is Java 9 compatible with Spark?

Warning - Java 9+ is not supported by Spark 2. You can optionally use Spark 3.

Which version of Java is supported by Spark?

Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.

Does Apache Spark work with Java 17?

Description. Apache Spark supports Java 8 and Java 11 (LTS). The next Java LTS version is 17. Version Release Date Java 17 (LTS) September 2021 Apache Spark has a release plan and `Spark 3.2 Code freeze` was July along with the release branch cut.

What is Apache Spark and how does it work?

Spark has been called a “general purpose distributed data processing engine”1 and “a lightning fast unified analytics engine for big data and machine learning” ². It lets you process big data sets faster by splitting the work up into chunks and assigning those chunks across computational resources.

When would you not want to use spark?

When would you not want to use Spark? For multi-user systems, with shared memory, Hive may be a better choice ². For real time, low latency processing, you may prefer Apache Kafka ⁴. With small data sets, it’s not going to give you huge gains, so you’re probably better off with the typical libraries and tools.

What is illegal argument exception in Java?

Then illegalargument exception will raise. When closed files has given as argument for a method to read that file. That means argument is invalid. When a method needs non empty string as a parameter but null string is passed. Like this when invalid arguments given then it will raise illegal argument exception.

What are the advantages of Apache Spark over MapReduce?

High speed data querying, analysis, and transformation with large data sets. Compared to MapReduce, Spark offers much less reading and writing to and from the disk, multi-threaded tasks (from Wikipedia: the threads share the resources of a single or multiple cores) within Java Virtual Machine (JVM) processes


1 Answers

Primary technical reason is that Spark depends heavily on direct access to native memory with sun.misc.Unsafe, which has been made private in Java 9.

  • https://issues.apache.org/jira/browse/SPARK-24421
  • http://apache-spark-developers-list.1001551.n3.nabble.com/Java-9-td20875.html
like image 68
user10077548 Avatar answered Sep 24 '22 09:09

user10077548