Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark vs Flink low memory available

I have build a Spark and Flink k-means application. My test case is a clustering on 1 million points on a 3 node cluster.

When in-memory bottlenecks begin, Flink starts to outsource to disk and work slowly but works. However, Spark lose executers if the memory is full and starts again (infinite loop?).

I try to customize the memory setting with the help from the mailing list here, thanks. But Spark does still not work.

Is it necessary to have any configurations to be set? I mean Flink works with low memory, Spark must also be able to; or not?

like image 994
Pa Rö Avatar asked Aug 11 '15 07:08

Pa Rö


People also ask

Is Flink faster than Spark?

The main reason for this is its stream processing feature, which manages to process rows upon rows of data in real time – which is not possible in Apache Spark's batch processing method. This makes Flink faster than Spark.

Is Flink better than Spark?

Flink's low latency outperforms Spark consistently, even at higher throughput. Spark can achieve low latency with lower throughput, but increasing the throughput will also increase the latency.

Can Flink replace Spark?

This issue is unlikely to have any practical significance on operations unless the use case requires low latency (financial systems) where delay of the order of milliseconds can cause significant impact. That being said, Flink is pretty much a work in progress and cannot stake claim to replace Spark yet.

How popular is Flink?

Flink has become the most popular computing engine in the streaming field. Flink was originally designed to be a big data engine for unified batch and stream computing. Efforts towards this design goal actually started in 2018. To implement this goal, Alibaba established a new and unified API architecture and solution.


1 Answers

I am not a Spark expert (and I am an Flink contributor). As far as I know, Spark is not able to spill to disk if there is not enough main memory. This is one advantage of Flink over Spark. However, Spark announced a new project call "Tungsten" to enable managed memory similar to Flink. I don't know if this feature is already available: https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

There are a couple of SO question about Spark out of memory problems (an Internet search with "spark out of memory" yield many results, too):

spark java.lang.OutOfMemoryError: Java heap space Spark runs out of memory when grouping by key Spark out of memory

Maybe one of those help.

like image 170
Matthias J. Sax Avatar answered Sep 20 '22 13:09

Matthias J. Sax