Fine grained transformation vs coarse grained transformations

Tags:

Could anyone please explain the difference between fine grained transformation vs coarse grained transformations in context of Spark? I was reading the paper on RDDs (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf) and not very clear how coarse gained transformation provides fault tolerance in an effective way.

487

asked Oct 04 '14 17:10

Amar

1 Answers

A fine grained update would be an update to one record in a database whereas coarse grained is generally functional operators (like used in spark) for example map, reduce, flatMap, join. Spark's model takes advantage of this because once it saves your small DAG of operations (small compared to the data you are processing) it can use that to recompute as long as the original data is still there. With fine grained updates you cannot recompute because saving the updates could potentially cost as much as saving the data itself, basically if you update each record out of billions separately you have to save the information to compute each update, whereas with coarse grained you can save one function that updates a billion records. Clearly though this comes at the cost of not being as flexible as a fine grained model.

163

answered Sep 22 '22 05:09

aaronman

Related questions
                            
                                Hive Buckets-understanding TABLESAMPLE(BUCKET X OUT OF Y)
                            
                                Messed up sed syntactics in hadoop startup script after reinstalling JVM
                            
                                build hadoop 2.2 on windows
                            
                                HDFS file watcher
                            
                                Tuning Hive Queries That Uses Underlying HBase Table
                            
                                Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password) during ambari hadoop installation
                            
                                Concat Avro files using avro-tools
                            
                                Is there a way to transpose data in Hive
                            
                                Spark with HBASE vs Spark with HDFS
                            
                                Hive: SELECT AS and GROUP BY
                            
                                How Java Hadoop Mapper can send multiple values
                            
                                HDFS error put: `input': No such file or directory
                            
                                Apache Hadoop vs Google Bigdata
                            
                                Hadoop Reducer Values in Memory?
                            
                                Loading csv data into Hbase [closed]
                            
                                Difference between 3 memory parameters in Hadoop 2?
                            
                                Create HIVE Table with multi character delimiter
                            
                                How to increase the number of containers in nodemanager in YARN
                            
                                Hadoop: Number of mappers and reducers
                            
                                Hadoop2.2.0 can't visit the web http://<ip>:8088

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fine grained transformation vs coarse grained transformations

Tags:

apache-spark

rdd

hadoop

Amar

People also ask

1 Answers

aaronman

Recent Activity

Donate For Us