Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ALTER ADD COLUMNS of Delta table?

Tags:

delta-lake

I want to add some columns in a Delta table using spark sql, but it showing me error like :

ALTER ADD COLUMNS does not support datasource table with type org.apache.spark.sql.delta.sources.DeltaDataSource.
You must drop and re-create the table for adding the new columns.

Is there any way to alter my table in delta lake?

like image 327
Yogesh Avatar asked Oct 31 '25 06:10

Yogesh


1 Answers

Thanks a lot for this question! I learnt quite a lot while hunting down a solution 👍


This is Apache Spark 3.2.1 and Delta Lake 1.1.0 (all open source).


The reason for the error is that Spark SQL (3.2.1) supports ALTER ADD COLUMNS statement for csv, json, parquet, orc data sources only. Otherwise, it throws the exception.

I assume you ran ALTER ADD COLUMNS using SQL (as the root cause would've been caught earlier if you'd used Scala API or PySpark).

That leads us to org.apache.spark.sql.delta.catalog.DeltaCatalog that has to be "installed" to Spark SQL for it to recognize Delta Lake as a supported datasource. This is described in the official Quickstart.

For PySpark (on command line) it'd be as follows:

./bin/pyspark \
  --packages io.delta:delta-core_2.12:1.1.0 \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

In order to extend Spark SQL with Delta Lake's features (incl. ALTER ADD COLUMNS support) you have to add the following configuration properties for DeltaSparkSessionExtension and DeltaCatalog:

  1. spark.sql.extensions
  2. spark.sql.catalog.spark_catalog

They are mandatory (and optional in managed environments like Azure Databricks that were mentioned as working fine for obvious reasons).

like image 196
Jacek Laskowski Avatar answered Nov 02 '25 22:11

Jacek Laskowski