Where Spark RDD Lineage is stored? As per white paper on RDD, it is persisted in-memory but want to know if it is at driver side or somewhere else on cluster.
Also how fault-tolerance is ensured i.e. how many replications of RDD (metadata) are created by default?
I want to understand core framework behaviour when we are not using persist() method.
The RDD lineage lives on the driver where RDDs live. When jobs are submitted, this information is no longer relevant. It's an internal part of any RDD and that's how it knows the parents.
When the driver fails RDD lineage is gone as is the entire computation. The driver is...well...the driver and without it nothing really happens.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With