How to add partitioning to existing Iceberg table which is not partitioned? Table is loaded with data already.
Table was created:
import org.apache.iceberg.hive.HiveCatalog
import org.apache.iceberg.catalog._
import org.apache.iceberg.spark.SparkSchemaUtil
import org.apache.iceberg.PartitionSpec
import org.apache.spark.sql.SaveMode._
val df1 = spark
.range(1000)
.toDF
.withColumn("level",lit("something"))
val catalog = new HiveCatalog(spark.sessionState.newHadoopConf())
val icebergSchema = SparkSchemaUtil.convert(df1.schema)
val icebergTableName = TableIdentifier.of("default", "icebergTab")
val icebergTable = catalog
.createTable(icebergTableName, icebergSchema, PartitionSpec.unpartitioned)
Any suggestions?
Right now, the way to add partitioning is to update the partition spec manually.
val table = catalog.loadTable(tableName)
val ops = table.asInstanceOf[BaseTable].operations
val spec = PartitionSpec.builderFor(table.schema).identity("level").build
val base = ops.current
val newMeta = base.updatePartitionSpec(spec)
ops.commit(base, newMeta)
There is a pull request to add an operation to make changes, like addField("level")
, but that isn't quite finished yet. I think it will be in the 0.11.0 release.
Keep in mind:
INSERT OVERWRITE
will replace the whole table. With a spec, just the partitions with new rows will be replaced. To avoid this, we recommend using the DataFrameWriterV2
interface in Spark, where you can be more explicit about what data values are overwritten.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With