Where Spark RDD lineage is stored?

Question

Where Spark RDD Lineage is stored? As per white paper on RDD, it is persisted in-memory but want to know if it is at driver side or somewhere else on cluster.

Also how fault-tolerance is ensured i.e. how many replications of RDD (metadata) are created by default?

I want to understand core framework behaviour when we are not using persist() method.

Jacek Laskowski · Accepted Answer

The RDD lineage lives on the driver where RDDs live. When jobs are submitted, this information is no longer relevant. It's an internal part of any RDD and that's how it knows the parents.

When the driver fails RDD lineage is gone as is the entire computation. The driver is...well...the driver and without it nothing really happens.

Where Spark RDD lineage is stored?

Tags:

apache-spark

rdd

Bhavuk Chawla

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us

Where Spark RDD lineage is stored?

Tags:

apache-spark

rdd

Bhavuk Chawla

1 Answers

Jacek Laskowski

Related questions

Recent Activity

Donate For Us