since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:
18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
The spark session is created as following:
val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate
Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.
Change the SPARK_LOCAL_HOSTNAME
to localhost
and try.
export SPARK_LOCAL_HOSTNAME=localhost
This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.
It seems that this change is required from 2.3 onwards.
If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).
In PySpark:
spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With