I'd like to add selected columns to a DataFrame that are not available already.
val columns=List("Col1","Col2","Col3")
for(i<-columns)
if(!df.schema.fieldNames.contains(i)==true)
df.withColumn(i,lit(0))
When select column the data frame only old column are coming, new columns are not coming.
It's more about how to do it in Scala than Spark and is excellent case for foldLeft
(my favorite!)
// start with an empty DataFrame, but could be anything
val df = spark.emptyDataFrame
val columns = Seq("Col1", "Col2", "Col3")
val columnsAdded = columns.foldLeft(df) { case (d, c) =>
if (d.columns.contains(c)) {
// column exists; skip it
d
} else {
// column is not available so add it
d.withColumn(c, lit(0))
}
}
scala> columnsAdded.printSchema
root
|-- Col1: integer (nullable = false)
|-- Col2: integer (nullable = false)
|-- Col3: integer (nullable = false)
You can also put the column expressions in a sequence and use star expansion:
val df = spark.range(10)
// Filter out names
val names = Seq("col1", "col2", "col3").filterNot(df.schema.fieldNames.contains)
// Create columns
val cols = names.map(lit(0).as(_))
// Append the new columns to the existing columns.
df.select($"*" +: cols: _*)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With