Can SPARK use multicore properly?

Question

I've read about spark and I found out that spark is written in scala. Since scala is a functional language, like erlang, it can use multi core properly. Is that correct?

I'm wondering if I can use spark in distributed system which have multicore processors. Can a single task use all cores at the same time? I've read YARN will assign different cores on each different tasks, but in this case, it is a single task

And, is it just enough to use multi thread programming in JAVA (hadoop) to use all cores in each computer? Since linux scheduler is based on threads?

Mischa Arefiev · Accepted Answer

Yes, it can, as this is its stated purpose — to split and parallelize what is parallelizeable. You can even specify amount of memory to be used by each executor.

However, some tasks cannot be parallelized, which is why sometimes Spark only occupies one core.

If you use the Spark shell, make sure you set the number of cores to use, as it is said in the answer to this question Why is Spark not using all cores on local machine

Source: official Spark docs https://spark.apache.org/docs/latest/configuration.html

Can SPARK use multicore properly?

Tags:

multithreading

apache-spark

multicore

Hanjun koo

1 Answers

Mischa Arefiev

Recent Activity

Donate For Us

Can SPARK use multicore properly?

Tags:

multithreading

apache-spark

multicore

Hanjun koo

1 Answers

Mischa Arefiev

Related questions

Recent Activity

Donate For Us