I am using Apache Spark to run machine learning algorithms and other big data tasks. Previously, I was using spark cluster standalone mode running spark master and worker on the same machine. Now, I added multiple worker machines and due to a tight firewall, I have to edit the random port of worker. Can anyone help how to change random spark ports and tell me exactly what configuration file needs to be edited? I read the spark documentation and it says spark-defaults.conf
should be configured but I don't know how I can configure this file for particularly changing random ports of spark.
maxRetries property is at default (16), here are a few examples: If the Spark application web UI is enabled, which it is by default, there can be no more than 17 Spark applications running at the same time, due to the 18th Spark driver process will fail to bind to an Application UI port.
By default, you can access the web UI for the master at port 8080. The port can be changed either in the configuration file or via command-line options. In addition, detailed log output for each job is also written to the work directory of each worker node ( SPARK_HOME/work by default).
You don't need to log to hadoop nodes to determine port. The easiest way is to use Resource Manager UI (as I described above), but if you preffer CLI you can use yarn command: $ yarn application -status application_1493800575189_0014 . This will show you tracking URL for spark driver.
Apache Spark makes heavy use of the network for communication between various processes, as shown in Figure 1. Figure 1. Network ports used in a typical Apache Spark environment These ports are further described in Table 1 and Table 2, which list the ports that Spark uses, both on the cluster side and on the driver side. Table 1.
Select Manage > Apache Spark configurations. Click on New button to create a new Apache Spark configuration, or click on Import a local .json file to your workspace. New Apache Spark configuration page will be opened after you click on New button. For Name, you can enter your preferred and valid name.
For instance, if your application developers need to access the Spark application web UI from outside the firewall, the application web UI port must be open on the firewall. Each time a Spark process is started, a number of listening ports are created that are specific to the intended function of that process.
Spark Security: Things You Need To Know. Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Spark supports multiple deployments types and each one supports different levels of security. Not all deployment types will be secure in all environments and none are secure by default.
Update for Spark 2.x
Some libraries have been rewritten from scratch and many legacy *.port
properties are now obsolete (cf. SPARK-10997 / SPARK-20605 / SPARK-12588 / SPARK-17678 / etc)
For Spark 2.1, for instance, the port ranges on which the driver will listen for executor traffic are
spark.driver.port
and spark.driver.port
+spark.port.maxRetries
spark.driver.blockManager.port
and spark.driver.blockManager.port
+spark.port.maxRetries
And the port range on which the executors will listen for driver traffic and/or other executors traffic is
spark.blockManager.port
and spark.blockManager.port
+spark.port.maxRetries
The "maxRetries" property allows for running several Spark jobs in parallel; if the base port is already used, then the new job will try the next one, etc, unless the whole range is already used.
Source:
https://spark.apache.org/docs/2.1.1/configuration.html#networking
https://spark.apache.org/docs/2.1.1/security.html under "Configuring ports"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With