Basic Spark example not working

Tags:

apache-spark

I'm learning Spark and wanted to run the simplest possible cluster consisting of two physical machines. I've done all the basic setup and it seems to be fine. The output of the automatic start script looks as follows:

[username@localhost sbin]$ ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/username/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.master.Master-1-localhost.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/sername/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.worker.Worker-1-localhost.out
[email protected].???.??: starting org.apache.spark.deploy.worker.Worker, logging to /home/username/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out

so no errors here and seems that a Master node is running as well as two Worker nodes. However when I open the WebGUI at 192.168.???.??:8080, it only lists one worker - the local one. My issue is similar to that described here: Spark Clusters: worker info doesn't show on web UI but There's nothing going on in my /etc/hosts file. All it contains is:

127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

What am I missing? Both machines are running Fedora Workstation x86_64.

333

asked Feb 16 '16 13:02

Krzysiek Setlak

1 Answers

it seems like spark is very picky about IP and machine names. so, when starting your master, it will use your machine name to register spark master. if that name is not reachable from your workers, it will be almost impossible to reach.

a workaround is to start your master like this

SPARK_MASTER_IP=YOUR_SPARK_MASTER_IP ${SPARK_HOME}/sbin/start-master.sh

then, you will be able to connect your slaves like this

${SPARK_HOME}/sbin/start-slave.sh spark://**YOUR_SPARK_MASTER_IP**:PORT

and there you go!

125

answered Oct 02 '22 14:10

dsncode

Related questions
                            
                                Integrate key-value database with Spark
                            
                                What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?
                            
                                What does df.repartition with no column arguments partition on?
                            
                                Reading HDF5 files [closed]
                            
                                foldLeft or foldRight equivalent in Spark?
                            
                                How to match Dataframe column names to Scala case class attributes?
                            
                                What does stage mean in the spark logs?
                            
                                Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node
                            
                                pyspark Do python processes on an executor node share broadcast variables in ram?
                            
                                cannot resolve xyz given input columns error when creating Spark dataset
                            
                                Creating indices for each group in Spark dataframe
                            
                                java.lang.NoClassDefFoundError: Could not initialize class when launching spark job via spark-submit in scala code
                            
                                multi-processing with spark(PySpark) [duplicate]
                            
                                How to manually set group.id and commit kafka offsets in spark structured streaming?
                            
                                Use of lit() in expr()
                            
                                How to set group.id for consumer group in kafka data source in Structured Streaming?
                            
                                Can SPARK use multicore properly?
                            
                                Pass array as an UDF parameter in Spark SQL
                            
                                How does Spark on Yarn store shuffled files?
                            
                                Setting spark classpaths on EC2: spark.driver.extraClassPath and spark.executor.extraClassPath

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With