I want to add some columns in a Delta table using spark sql, but it showing me error like :
ALTER ADD COLUMNS does not support datasource table with type org.apache.spark.sql.delta.sources.DeltaDataSource.
You must drop and re-create the table for adding the new columns.
Is there any way to alter my table in delta lake?
Thanks a lot for this question! I learnt quite a lot while hunting down a solution 👍
This is Apache Spark 3.2.1 and Delta Lake 1.1.0 (all open source).
The reason for the error is that Spark SQL (3.2.1) supports ALTER ADD COLUMNS statement for csv, json, parquet, orc data sources only. Otherwise, it throws the exception.
I assume you ran ALTER ADD COLUMNS using SQL (as the root cause would've been caught earlier if you'd used Scala API or PySpark).
That leads us to org.apache.spark.sql.delta.catalog.DeltaCatalog that has to be "installed" to Spark SQL for it to recognize Delta Lake as a supported datasource. This is described in the official Quickstart.
For PySpark (on command line) it'd be as follows:
./bin/pyspark \
--packages io.delta:delta-core_2.12:1.1.0 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
In order to extend Spark SQL with Delta Lake's features (incl. ALTER ADD COLUMNS support) you have to add the following configuration properties for DeltaSparkSessionExtension and DeltaCatalog:
spark.sql.extensionsspark.sql.catalog.spark_catalogThey are mandatory (and optional in managed environments like Azure Databricks that were mentioned as working fine for obvious reasons).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With