Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Delta with Spark 3.0 Preview?

SPARK 3.0 not able to save a DF as delta table in HDFS

  • Scala version 2.12.10
  • Spark version 3.0 Preview

Able to do it in 2.4.4 but partition is not getting created.

Input sample:

Vehicle_id|model|brand|year|miles|intake_date_time

v0001H|verna|Hyundai|2011|5000|2018-01-20 06:30:00

v0001F|Eco-sport|Ford|2013|4000|2018-02-10 06:30:00

v0002F|Endeavour|Ford|2011|8000|2018-04-12 06:30:00

v0001L|Gallardo|Lambhorghini|2013|2000|2018-05-16 06:30:00
// reading 
val deltaTableInput1 = spark.read
                            .format("com.databricks.spark.csv")
                            .option("header","true")
                            .option("delimiter","|")
                            .option("inferSchema","true")
                            .load("file")
                            .selectExpr("Vehicle_id","model","brand","year","month","miles","CAST(concat(substring(intake_date_time,7,4),concat(substring(intake_date_time,3,4),concat(substring(intake_date_time,1,2),substring(intake_date_time,11,9)))) AS TIMESTAMP) as intake_date_time")  

// Writing
 deltaTableInput1.write
                 .mode("overwrite")
                 .partitionBy("brand","model","year","month")
                 .format("delta")
                 .save("path")

ERROR:

com.google.common.util.concurrent.ExecutionError: java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class; at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261) at com.google.common.cache.LocalCache.get(LocalCache.java:4000) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:714) at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:676) at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:124) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:71) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:69) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:87) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:189) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:227) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:224) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:185) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:829) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:829) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236) ... 47 elided Caused by: java.lang.NoSuchMethodError: org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class; at org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:122) at org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:120) at org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58) at org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:117) at org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:115) at org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58) at org.apache.spark.sql.delta.DeltaLog.(DeltaLog.scala:79) at org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$2(DeltaLog.scala:718) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194) at org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$1(DeltaLog.scala:718) at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77) at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67) at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:645) at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:103) at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:89) at org.apache.spark.sql.delta.DeltaLog$.recordDeltaOperation(DeltaLog.scala:645) at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:717) at org.apache.spark.sql.delta.DeltaLog$$anon$3.call(DeltaLog.scala:714) at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) ... 71 more

In Spark 2.4.4 from REPL it's getting written without partitioning.

Spark 3.0 error

like image 448
Raptor0009 Avatar asked Nov 06 '22 11:11

Raptor0009


1 Answers

Found on slack:

Spark 3.0 is significantly different than Spark 2.4, therefore it won't work

There is a branch though? https://github.com/delta-io/delta/tree/spark-3.0-snapshot

like image 105
Jacek Laskowski Avatar answered Nov 15 '22 13:11

Jacek Laskowski