Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception :

AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to `dbfs:/user/[email protected]/delta/customer-data/` using Databricks Delta, but there is no\ntransaction log present. Check the upstream job to make sure that it is writing\nusing format("delta") and that you are trying to write to the table base path.\n\nTo disable this check, SET spark.databricks.delta.formatCheck.enabled=false\nTo learn more about Delta, see https://docs.azuredatabricks.net/delta/index.html\n;

Here is the code preceding the exception :

from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, StringType

inputSchema = StructType([
  StructField("InvoiceNo", IntegerType(), True),
  StructField("StockCode", StringType(), True),
  StructField("Description", StringType(), True),
  StructField("Quantity", IntegerType(), True),
  StructField("InvoiceDate", StringType(), True),
  StructField("UnitPrice", DoubleType(), True),
  StructField("CustomerID", IntegerType(), True),
  StructField("Country", StringType(), True)
])

rawDataDF = (spark.read
  .option("header", "true")
  .schema(inputSchema)
  .csv(inputPath)
)

# write to Delta Lake
rawDataDF.write.mode("overwrite").format("delta").partitionBy("Country").save(DataPath) 
like image 354
Themis Avatar asked Jul 16 '19 08:07

Themis


2 Answers

This error message is telling you that there is already data at the destination path (in this case dbfs:/user/[email protected]/delta/customer-data/), and that that data is not in the Delta format (i.e. there is no transaction log). You can either choose a new path (which based on the comments above, it seems like you did) or delete that directory and try again.

like image 51
Michael Armbrust Avatar answered Oct 13 '22 12:10

Michael Armbrust


I found this Question with this search: "You are trying to write to *** using Databricks Delta, but there is no transaction log present."

In case someone searches for the same: For me the solution was to explicitly code

.write.format("parquet")

because

.format("delta")

is the dafault since Databricks Runtime 8.0 and above and I need "parquet" for legacy reasons.

like image 41
user__42 Avatar answered Oct 13 '22 11:10

user__42