Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark network ports configuration

When Apache Spark runs in a standalone cluster mode, it uses a number of ports for different types of network communication between (among others) driver and executors/workers.

In spark release 1.1.0 they have added quite a number of properties to allow configuring ports used and also developed a guide for that: http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security But it seems one can control only server ports, i.e. the ones being listened.

However, I didn't find the way I can control client ports a spark executor/worker will open to connect to a driver program. My driver program runs in tomcat and I have to be very specific in my catalina.policy to allow only specific IP addresses/ports.

So, is there a way I can control all ports used by Spark to configure socket permissions in catalina.policy of a tomcat running a driver program so that it is able to communicate with executors/workers?

EDIT The error I am getting on tomcat side is:

2014-09-19 16:55:42,437 [New I/O server boss #6] WARN  T:[] V:[]o.j.n.c.s.nio.AbstractNioSelector - Failed to accept a connection.
java.security.AccessControlException: access denied ("java.net.SocketPermission" "<worker IP address>:44904" "accept,resolve")
    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372) ~[na:1.7.0_67]
    at java.security.AccessController.checkPermission(AccessController.java:559) ~[na:1.7.0_67]
    at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[na:1.7.0_67]
    at java.lang.SecurityManager.checkAccept(SecurityManager.java:1170) ~[na:1.7.0_67]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:261) ~[na:1.7.0_67]
    at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:100) ~[netty-3.6.6.Final.jar:na]
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) ~[netty-3.6.6.Final.jar:na]
    at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) ~[netty-3.6.6.Final.jar:na]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_67]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_67]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
like image 266
preeze Avatar asked Sep 19 '14 16:09

preeze


People also ask

What is Spark Port maxRetries?

port. maxRetries . As per the docs: Maximum number of retries when binding to a port before giving up. When a port is given a specific value (non 0), each subsequent retry will increment the port used in the previous attempt by 1 before retrying.

How do I change the default port for Spark?

You can optionally configure the cluster further by setting environment variables in conf/spark-env.sh. Create this file by starting with the conf/spark-env. sh. template, and copy it to all your worker machines for the settings to take effect.

What are the three primary configuration categories in a Spark application?

Executor container (it is one JVM) allocates a memory part that consists of three sections. They are Heap memory, Off-Heap memory, and Overhead memory respectively. Off-Heap memory is disabled by default with the property spark. memory.


1 Answers

A client port is typically determined dynamically, at runtime.

The server port is the port that is connected to by the initial client request, as that initial request is being handled, the connection will be "finished" which (among other things) opens a "client" port on the requesting machine to get the reply information. Typically this client port is embedded in the initial request, and is pulled from a range configured in the client's operating system (or at least, the tcp layer of the client's network stack).

If one could configure a client to only offer one port, it would probably introduce issues because when you run two instances of the client program, the subsequent instance would not be able to open its input from the server port, and the first client would get the responses for both the client's requests.

As you are seeing your server fail to open a client (response) port, you likely need to check (in this order)

  1. The networking path from the server to the client (it can be different than that from the client to the server). If it's ok...
  2. The client firewall configuration. It could be that an overzealous firewall configuration might be blocking requests to finish the client connection request by blocking the client port range.
  3. The client software / system configuration. While extremely rare, sometimes people configure their systems to put client ports outside the range of what can be supported (this does not seem to be your case). Typically it is 65535.

Odds are you have a garden variety networking issue, but it could be a firewall issue (or an overzealous virus scanner / fire-walling solution).

like image 103
Edwin Buck Avatar answered Oct 02 '22 01:10

Edwin Buck