Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark: SparkPi Example

Tags:

apache-spark

I'm trying to run the Spark Examples and I just don't understand what's going on. I used

MASTER=spark://Illidan:7077 ./bin/run-example SparkPi 10

which does start the process, but all I get is INFO Messages.

So what is the "10" for?

Can the INFO Messages be turned of?

Where is the Output? Where is the calculated Pi?

Can I start the example from the shell? Do I have to start it from the spark shell to see the prints or is it saved in some file I don't know of?

I swear to God I have been going through the documentation a hundred times. I need some help.

Hier is a small snippet of my terminal output. Thanks in advance. :D

14/12/31 00:02:25 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.5:44913 (size: 1295.0 B, free: 267.3 MB)
14/12/31 00:02:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 25231 ms on 192.168.2.7 (8/10)
14/12/31 00:02:26 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 25358 ms on 192.168.2.5 (9/10)
14/12/31 00:02:26 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.4:36505 (size: 1295.0 B, free: 267.3 MB)
14/12/31 00:02:27 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 25877 ms on 192.168.2.4 (10/10)
14/12/31 00:02:27 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
like image 977
Nima Mousavi Avatar asked Dec 31 '14 00:12

Nima Mousavi


2 Answers

As you see, 10 is the number of partitions (or slices) created by the spark program. The job of computing PI has been divided in 10 tasks (PI is computed through an iterative algorithm).

The output shows that the job completed successfully. You should also see a row with the result.

You can find the source code of the PI example here.

like image 74
Nicola Ferraro Avatar answered Oct 11 '22 13:10

Nicola Ferraro


OrangePi One SBC

  • CPU: 1.6GHz H3 Quad-core Cortex-A7 H.265/HEVC 4K

  • GPU: Mali400MP2 GPU @600MHz, Supports OpenGL ES 2.0

  • RAM: 512MB DDR3 (shared with GPU)

  • Armbian OS Debian GNU/Linux 8 (jessie) 3.4.112-sun8i

My observation is that on OrangePi, the execution is SINGLE THREADED. I was expecting 4 parallel tasks, one per core. Please see the data below. I will see what can be optimized for existing cores, or Mali GPU (~7 GigaFLOPS).

root@orangepione:~/spark/spark-2.0.0-bin-hadoop2.7# ./bin/run-example SparkPi 10

  • where 10 is number of distributed tasks/partitions/slices/threads

executed as 1 task on a single 4 core board

  • took 19.0 s Pi is roughly 3.145951459514595
  • took 19.0 s Pi is roughly 3.1346713467134673

executed as 2 tasks on a single 4 core board

  • took 19.3 s Pi is roughly 3.1420757103785517
  • took 19.4 s Pi is roughly 3.13639568197841

executed as 4 tasks on a single 4 core board

  • took 21.2 s Pi is roughly 3.141427853569634
  • took 21.5 s Pi is roughly 3.1445478613696536

executed as 10 tasks on a single 4 core board

  • took 40.8 s Pi is roughly 3.143983143983144
  • took 40.4 s Pi is roughly 3.141019141019141

executed as 50 tasks on a single 4 core board

  • took 156.5 s Pi is roughly 3.1399118279823655
like image 32
Uki D. Lucas Avatar answered Oct 11 '22 14:10

Uki D. Lucas