Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure Apache Spark random worker ports for tight firewalls?

I am using Apache Spark to run machine learning algorithms and other big data tasks. Previously, I was using spark cluster standalone mode running spark master and worker on the same machine. Now, I added multiple worker machines and due to a tight firewall, I have to edit the random port of worker. Can anyone help how to change random spark ports and tell me exactly what configuration file needs to be edited? I read the spark documentation and it says spark-defaults.conf should be configured but I don't know how I can configure this file for particularly changing random ports of spark.

like image 771
Isma Khan Avatar asked Jan 01 '15 07:01

Isma Khan


People also ask

What is spark Port maxRetries?

maxRetries property is at default (16), here are a few examples: If the Spark application web UI is enabled, which it is by default, there can be no more than 17 Spark applications running at the same time, due to the 18th Spark driver process will fail to bind to an Application UI port.

How do I change the default port for spark?

By default, you can access the web UI for the master at port 8080. The port can be changed either in the configuration file or via command-line options. In addition, detailed log output for each job is also written to the work directory of each worker node ( SPARK_HOME/work by default).

How do I find my spark driver port?

You don't need to log to hadoop nodes to determine port. The easiest way is to use Resource Manager UI (as I described above), but if you preffer CLI you can use yarn command: $ yarn application -status application_1493800575189_0014 . This will show you tracking URL for spark driver.

How does Apache Spark communicate with the network?

Apache Spark makes heavy use of the network for communication between various processes, as shown in Figure 1. Figure 1. Network ports used in a typical Apache Spark environment These ports are further described in Table 1 and Table 2, which list the ports that Spark uses, both on the cluster side and on the driver side. Table 1.

How to create a new Apache Spark configuration?

Select Manage > Apache Spark configurations. Click on New button to create a new Apache Spark configuration, or click on Import a local .json file to your workspace. New Apache Spark configuration page will be opened after you click on New button. For Name, you can enter your preferred and valid name.

Why do I need to open a port in spark?

For instance, if your application developers need to access the Spark application web UI from outside the firewall, the application web UI port must be open on the firewall. Each time a Spark process is started, a number of listening ports are created that are specific to the intended function of that process.

How secure is Apache Spark?

Spark Security: Things You Need To Know. Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Spark supports multiple deployments types and each one supports different levels of security. Not all deployment types will be secure in all environments and none are secure by default.


1 Answers

Update for Spark 2.x


Some libraries have been rewritten from scratch and many legacy *.port properties are now obsolete (cf. SPARK-10997 / SPARK-20605 / SPARK-12588 / SPARK-17678 / etc)

For Spark 2.1, for instance, the port ranges on which the driver will listen for executor traffic are

  • between spark.driver.port and spark.driver.port+spark.port.maxRetries
  • between spark.driver.blockManager.port and spark.driver.blockManager.port+spark.port.maxRetries

And the port range on which the executors will listen for driver traffic and/or other executors traffic is

  • between spark.blockManager.port and spark.blockManager.port+spark.port.maxRetries

The "maxRetries" property allows for running several Spark jobs in parallel; if the base port is already used, then the new job will try the next one, etc, unless the whole range is already used.

Source:
   https://spark.apache.org/docs/2.1.1/configuration.html#networking
   https://spark.apache.org/docs/2.1.1/security.html under "Configuring ports"

like image 75
Samson Scharfrichter Avatar answered Oct 02 '22 08:10

Samson Scharfrichter