Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run a Spark-java program from command line [closed]

I am running the wordcount java program in spark. How do I run it from the command line.

like image 676
Pooja3101 Avatar asked Mar 07 '14 14:03

Pooja3101


People also ask

How do I run a spark program from the command line?

Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.

How do I run a Java program from the command line?

Type 'javac MyFirstJavaProgram. java' and press enter to compile your code. If there are no errors in your code, the command prompt will take you to the next line (Assumption: The path variable is set). Now, type ' java MyFirstJavaProgram ' to run your program.

How do I run a spark command in shell script?

Go to the Apache Spark Installation directory from the command line and type bin/spark-shell and press enter, this launches Spark shell and gives you a scala prompt to interact with Spark in scala language. If you have set the Spark in a PATH then just enter spark-shell in command line or terminal (mac users).


1 Answers

Pick up the wordcount example from say: https://github.com/holdenk/fastdataprocessingwithsparkexamples/tree/master/src/main/scala/pandaspark/examples. Follow these steps to create the fat jar file:

mkdir example-java-build/; cd example-java-build

mvn archetype:generate \
   -DarchetypeGroupId=org.apache.maven.archetypes \
   -DgroupId=spark.examples \
   -DartifactId=JavaWordCount \
   -Dfilter=org.apache.maven.archetypes:maven-archetype-quickstart

cp ../examples/src/main/java/spark/examples/JavaWordCount.java
JavaWordCount/src/main/java/spark/examples/JavaWordCount.java

You add the relevant spark-core and spark examples dependencies. Make sure you have the dependencies based on your version of spark. I use spark 1.1.0 and so I have the relevant dependencies. My pom.xml looks like this:

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-examples_2.10</artifactId>
        <version>1.1.0</version>
</dependency>
<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.1.0</version>
</dependency>
  </dependencies>

Build your jar file using mvn.

cd example-java-build/JavaWordCount
mvn package

This creates your fat jar file inside the target directory. Copy the jar file to any location on the server. Go to the your bin folder of your spark. ( in my case: /root/spark-1.1.0-bin-hadoop2.4/bin)

Submit spark job: My job looks like this:

./spark-submit --class "spark.examples.JavaWordCount" --master yarn://myserver1:8032 /root/JavaWordCount-1.0-SNAPSHOT.jar  hdfs://myserver1:8020/user/root/hackrfoe.txt

Here --class is: The entry point for your application (e.g. org.apache.spark.examples.SparkPi) --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077) The last argument is any text file of your choice for the program.

The output should like this, giving word counts of all words in the text file.

in: 17
sleeping.: 1
sojourns: 1
What: 4
protect: 1
largest: 1
other: 1
public: 1
worst: 1
hackers: 12
detected: 1
from: 4
and,: 1
secretly: 1
breaking: 1
football: 1
answer.: 1
attempting: 2
"hacker: 3

Hope this helps!

like image 160
user1189851 Avatar answered Oct 13 '22 10:10

user1189851