Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark cluster Master IP address not binding to floating IP

I'm trying to configure a Spark cluster using OpenStack. Currently I have two servers named

  • spark-master (IP: 192.x.x.1, floating IP: 87.x.x.1)
  • spark-slave-1 (IP: 192.x.x.2, floating IP: 87.x.x.2)

I am running into problems when trying to use these floating IPs vs the standard public IPs.

On the spark-master machine, the hostname is spark-master and /etc/hosts looks like

127.0.0.1 localhost
127.0.1.1 spark-master

The only change made to spark-env.sh is export SPARK_MASTER_IP='192.x.x.1'. If I run ./sbin/start-master.sh I can view the web UI.

The thing is I view the web UI using the floating IP 87.x.x.1, and there it lists the Master URL: spark://192.x.x.1:7077.

From the slave I can run ./sbin/start-slave.sh spark://192.x.x.1:7077 and it connects successfully.

If I try to use the floating IP by changing spark-env.sh on the master to export SPARK_MASTER_IP='87.x.x.1' then I get the following error log

Spark Command: /usr/lib/jvm/java-7-openjdk-amd64/bin/java -cp /usr/local/spark-1.6.1-bin-hadoop2.6/conf/:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/spark-1.6.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 87.x.x.1 --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 15:05:33 INFO Master: Registered signal handlers for [TERM, HUP, INT]
16/05/12 15:05:33 WARN Utils: Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)
16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/05/12 15:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 15:05:33 INFO SecurityManager: Changing view acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: Changing modify acls to: ubuntu
16/05/12 15:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/12 15:05:33 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:463)
  at sun.nio.ch.Net.bind(Net.java:455)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
  at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
  at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
  at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
  at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
  at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
  at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
  at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
  at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
  at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
  at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  at java.lang.Thread.run(Thread.java:745)

Obviously the takeaway here for me is the line

Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0) 16/05/12 15:05:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

but no matter what approach I then try and take I just run into more errors.

If I set both export SPARK_MASTER_IP='87.x.x.1' and export SPARK_LOCAL_IP='87.x.x.1' and try ./sbin/start-master.sh I get the following error log

16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7077. Attempting port 7078.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7078. Attempting port 7079.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7079. Attempting port 7080.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7080. Attempting port 7081.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7081. Attempting port 7082.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7082. Attempting port 7083.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7083. Attempting port 7084.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7084. Attempting port 7085.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7085. Attempting port 7086.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7086. Attempting port 7087.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7087. Attempting port 7088.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7088. Attempting port 7089.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7089. Attempting port 7090.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7090. Attempting port 7091.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7091. Attempting port 7092.
16/05/17 11:00:55 WARN Utils: Service 'sparkMaster' could not bind on port 7092. Attempting port 7093.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries!

This, despite the fact my security group seems correct

ALLOW IPv4 443/tcp from 0.0.0.0/0
ALLOW IPv4 80/tcp from 0.0.0.0/0
ALLOW IPv4 8081/tcp from 0.0.0.0/0
ALLOW IPv4 8080/tcp from 0.0.0.0/0
ALLOW IPv4 18080/tcp from 0.0.0.0/0
ALLOW IPv4 7077/tcp from 0.0.0.0/0
ALLOW IPv4 4040/tcp from 0.0.0.0/0
ALLOW IPv4 to 0.0.0.0/0
ALLOW IPv6 to ::/0
ALLOW IPv4 22/tcp from 0.0.0.0/0
like image 714
Philip O'Brien Avatar asked May 12 '16 15:05

Philip O'Brien


2 Answers

I've set a spark cluster (standalone cluster) on Openstack myself and in my /etc/hosts file on the master, I have:

127.0.0.1 localhost

192.168.1.2 spark-master instead of 127.0.0.1

Now, since I have a virtual private network for my master and my slaves, I only work with the private IPs. The only time I use the floating IP is on my host computer when I launch spark-submit --master spark://spark-master (spark-master here resolves to the floating IP). I don't think you need to try to bind the floating IP. I hope that helps!

Bruno

like image 183
Bruno B. Carvalho Avatar answered Oct 18 '22 12:10

Bruno B. Carvalho


As appears in logs,

Your hostname, spark-master resolves to a loopback address: 127.0.1.1; using 192.x.x.1 instead (on interface eth0)

Spark automatically try to get the IP of the host, and it uses the other IP 192.x.x.1 rather than the floating IP 87.x.x.1

To resolve this problem you should set SPARK_LOCAL_IP=87.x.x.1 (prefereably in spark-env.sh) and start your master again

like image 37
user1314742 Avatar answered Oct 18 '22 11:10

user1314742