Could anyone please explain the difference between fine grained transformation vs coarse grained transformations in context of Spark? I was reading the paper on RDDs (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf) and not very clear how coarse gained transformation provides fault tolerance in an effective way.
The word 'granular' is used to describe something that is made up of multiple elements. If the elements are small, we call it "fine-grained," and if the elements are large, we call it "coarse-grained." These are terms typically used in economics, computer science and geology.
The coarse-grained operation means to apply operations on all the objects at once. Fine-grained operations mean to apply operations on a smaller set. We generally apply coarse-grained operation, as it works on entire cluster simultaneously. We can also create RDDs by its cache and divide it manually.
RDD was the primary user-facing API in Spark since its inception. At the core, an RDD is an immutable distributed collection of elements of your data, partitioned across nodes in your cluster that can be operated in parallel with a low-level API that offers transformations and actions.
(a) Said of a crystalline rock, and of its texture, in which the individual minerals are relatively large; specif. said of an igneous rock whose particles have an average diameter greater than 5 mm (0.2 in.).
A fine grained update would be an update to one record in a database whereas coarse grained is generally functional operators (like used in spark) for example map, reduce, flatMap, join. Spark's model takes advantage of this because once it saves your small DAG of operations (small compared to the data you are processing) it can use that to recompute as long as the original data is still there. With fine grained updates you cannot recompute because saving the updates could potentially cost as much as saving the data itself, basically if you update each record out of billions separately you have to save the information to compute each update, whereas with coarse grained you can save one function that updates a billion records. Clearly though this comes at the cost of not being as flexible as a fine grained model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With