Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delta Lake rollback

Need an elegant way to rollback Delta Lake to a previous version.

My current approach is listed below:

import io.delta.tables._

val deltaTable = DeltaTable.forPath(spark, testFolder)

spark.read.format("delta")
  .option("versionAsOf", 0)
  .load(testFolder)
  .write
  .mode("overwrite")
  .format("delta")
  .save(testFolder)

This is ugly though, as the whole data set need to be rewritten. It seems that some meta update would be sufficient and no data I/O should be necessary. Anyone knows a better approach for this?

like image 207
Fang Zhang Avatar asked Aug 26 '19 22:08

Fang Zhang


People also ask

How do I restore a previous version of a Delta table?

You can restore a Delta table to its earlier state by using the RESTORE command. A Delta table internally maintains historic versions of the table that enable it to be restored to an earlier state.

Can Delta Lake be used without spark?

The Delta Standalone Reader (DSR) is a JVM library that allows you to read Delta Lake tables without the need to use Apache Spark; i.e. it can be used by any application that cannot run Spark.

What is Delta Lake format?

What format does Delta Lake use to store data? Delta Lake uses versioned Parquet files to store your data in your cloud storage. Apart from the versions, Delta Lake also stores a transaction log to keep track of all the commits made to the table or blob store directory to provide ACID transactions.

What is a Delta Lake system?

Delta Lake is an open-source storage layer that brings ACID (atomicity, consistency, isolation, and durability) transactions to Apache Spark and big data workloads. The current version of Delta Lake included with Azure Synapse has language support for Scala, PySpark, and .


1 Answers

As of Delta Lake 0.7.0, you can rollback to an earlier version of your Delta Lake table using the RESTORE command. This is a much simpler way to use time travel to roll back your tables.

Scala:

import io.delta.tables._

val deltaTable = DeltaTable.forPath(spark, "/path/to/delta-table")

deltaTable.restoreToVersion(0)

Python:

from delta.tables import *

deltaTable = DeltaTable.forPath(spark, "/path/to/delta-table")

deltaTable.restoreToVersion(0)

SQL:

RESTORE TABLE delta.`/path/to/delta-table` TO VERSION AS OF 0

You can also use the restoreToTimestamp command if you'd prefer to do things that way instead. Read the documentation for more details.

like image 175
Crash Override Avatar answered Oct 16 '22 10:10

Crash Override