Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get output after running Apache Spark job on web

I'm a student learning Hadoop and Apache Spark. I wanna know how to get output from Apache Spark Job on web.

following is so simple php code to run Apache Spark Job on web because I just want to test it.

<?php
echo shell_exec("spark-submit --class stu.ac.TestProject.App --master spark://localhost:7077 /TestProject-0.0.1-SNAPSHOT.jar");
?>

and following is a example java code for Apache Spark job.

public class App 
{
public static void main( String[] args )
{
    SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
    sparkConf.setMaster("spark://localhost:7077");
    JavaSparkContext jsc = new JavaSparkContext(sparkConf);

    int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
    int n = 100000 * slices;
    List<Integer> l = new ArrayList<Integer>(n);
    for (int i = 0; i < n; i++) {
        l.add(i);
    }
    JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);

    JavaRDD<Integer> countRDD = dataSet.map(new Function<Integer, Integer>() {
        public Integer call(Integer arg0) throws Exception {
            double x = Math.random() * 2 - 1;
            double y = Math.random() * 2 - 1;
            return (x * x + y * y < 1) ? 1 : 0;
        }
    });

    int count = countRDD.reduce(new Function2<Integer, Integer, Integer>() {
        public Integer call(Integer arg0, Integer arg1) throws Exception {
            return arg0 + arg1;
        }
    });

    System.out.println("Pi is roughly " + 4.0 * count / n);
    jsc.stop();
}
}

I want to get only standard output but after running the code I got empty output. I build this java code on maven project so also checked its running on cmd mode.

How can I solve it?

Thanks in advance for your answer and sorry for my poor english. If you don't understand my question please make a comment.

like image 909
Likoed Avatar asked Oct 11 '14 13:10

Likoed


1 Answers

A job's output stays in the job so to speak. Even if Spark is fast, it's not so fast that it can instantly generate the data. A job is run a on a distributed cluster, this takes some time.

You'll have to write your output somewhere, typically into a database that you can then query from your web application. You don't start your job from your web application, it should rather be scheduled depending on your application's needs.

If you are running your job from within a Java, Scala, or Python job, you can retrieve its result directly. With PHP I'm not so sure.

like image 51
Marius Soutier Avatar answered Sep 24 '22 19:09

Marius Soutier