Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache zeppelin: java.lang.NullPointerException

When running any kind of command in zeppelin, I'm getting a "java.lang.NullPointerException" error - even simple stuff like sc.appName. Here's the full text:

java.lang.NullPointerException
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)
    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)
    at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
    at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

The error seems to point to something with Spark, but I have the location set correctly in zeppelin-env.sh:

export SPARK_HOME=/usr/local/spark

The only other fields that I've modified are as follows:

export HADOOP_CONF_DIR=/home/cloudera/hadoop/etc/hadoop
export PYSPARK_PYTHON=/usr/bin/python
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

My hadoop install doesn't have a "conf" folder, but the yarn-site file is in the indicated location. I'm using anonymous login, not sure if that's relevant. I can run the Spark shell successfully from the command line. I really did try to search around since it's such a common error, but nothing seemed to quite fit this situation. I can also provide the zeppelin-env.sh file if needed. Thanks in advance for any assistance!

like image 283
lengthy_preamble Avatar asked Nov 26 '22 02:11

lengthy_preamble


1 Answers

This "something with spark" triggered the fixes I made, so my cluster works now. Having no spark configured in Zeppelin worked, as soon, as I switched to my cluster config, it didn't work anymore. All the Versions you have, have to fit together, which is

  • Zeppelin spark interpreter
  • Possible Zeppelin-local spark installation (where Zeppelin's SPARK_HOME points to)
  • The remote spark master/cluster installation Version

Otherwise, you'll see on the spark side connection errors, deserialization errors, and similar. (At least, this was the case with me).

In Zeppelin's interpreter logfile, look for "Running Spark version"; this should be the spark version, that is actually used. I hope that helps!

like image 150
Frischling Avatar answered Apr 08 '23 17:04

Frischling