Spark atop of Docker not accepting jobs

Question

I'm trying to make a hello world example work with spark+docker, and here is my code.

object Generic {
  def main(args: Array[String]) {
    val sc = new SparkContext("spark://172.17.0.3:7077", "Generic", "/opt/spark-0.9.0")

    val NUM_SAMPLES = 100000
    val count = sc.parallelize(1 to NUM_SAMPLES).map{i =>
      val x = Math.random * 2 - 1
      val y = Math.random * 2 - 1
      if (x * x + y * y < 1) 1.0 else 0.0
    }.reduce(_ + _)

    println("Pi is roughly " + 4 * count / NUM_SAMPLES)
  }
}

When I run sbt run, I get

14/05/28 15:19:58 INFO client.AppClient$ClientActor: Connecting to master spark://172.17.0.3:7077...
14/05/28 15:20:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

I checked both the cluster UI, where I have 3 nodes that each have 1.5g of memory, and the namenode UI, where I see the same thing.

The docker logs show no output from the workers and the following from the master

14/05/28 21:20:38 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@master:7077] -> [akka.tcp://spark@10.0.3.1:48085]: Error [Association failed with [akka.tcp://spark@10.0.3.1:48085]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@10.0.3.1:48085]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: /10.0.3.1:48085

]

This happens a couple times, and then the program times out and dies with

[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Spark cluster looks down

When I did a tcpdump over the docker0 interface, and it looks like the workers and the master nodes are talking.

However, the spark console works.

If I set sc as val sc = new SparkContext("local", "Generic", System.getenv("SPARK_HOME")), the program runs

maasg · Accepted Answer

I've been there. The issue looks like the AKKA actor subsystem in Spark is binding on a different interface than Spark on docker0.

While your master ip is on: spark://172.17.0.3:7077

Akka is binding on: akka.tcp://spark@10.0.3.1:48085

If you masters/slaves are docker containers, they should be communicating through the docker0 interface in the 172.17.x.x range.

Try providing the master and slaves with their correct local IP using the env config SPARK_LOCAL_IP. See config docs for details.

In our docker setup for Spark 0.9 we are using this command to start the slaves:

${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_IP -i $LOCAL_IP

Which directly provides the local IP to the worker.

Spark atop of Docker not accepting jobs

Tags:

scala

apache-spark

Peter Klipfel

1 Answers

maasg

Recent Activity

Donate For Us

Spark atop of Docker not accepting jobs

Tags:

scala

apache-spark

Peter Klipfel

1 Answers

maasg

Related questions

Recent Activity

Donate For Us