Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid Spark URL in local spark session

Tags:

apache-spark

since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)

The spark session is created as following:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate

Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.

like image 673
Lorenz Bernauer Avatar asked Mar 07 '18 02:03

Lorenz Bernauer


3 Answers

Change the SPARK_LOCAL_HOSTNAME to localhost and try.

export SPARK_LOCAL_HOSTNAME=localhost
like image 138
Prakash Annadurai Avatar answered Oct 15 '22 18:10

Prakash Annadurai


This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.

It seems that this change is required from 2.3 onwards.

like image 9
Nagireddy Hanisha Avatar answered Oct 15 '22 19:10

Nagireddy Hanisha


If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).

In PySpark:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()
like image 3
Felipe Zschornack Avatar answered Oct 15 '22 17:10

Felipe Zschornack