how to find the difference between two last versions of a Delta Table ? Here is as far as I went using dataframes :
val df1 = spark.read
.format("delta")
.option("versionAsOf", "0001")
.load("/path/to/my/table")
val df2 = spark.read
.format("delta")
.option("versionAsOf", "0002")
.load("/path/to/my/table")
// non idiomatic way to do it ...
df1.unionAll(df2).except(df1.intersect(df2))
there is a commercial version of Delta by Databricks that provides a solution called CDF but I'm looking for an open source alternative
This return a data frame with the comparative
import uk.co.gresearch.spark.diff.DatasetDiff
df1.diff(df2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With