I have a spark dataframe looks like:
id DataArray
a array(3,2,1)
b array(4,2,1)
c array(8,6,1)
d array(8,2,4)
I want to transform this dataframe into:
id col1 col2 col3
a 3 2 1
b 4 2 1
c 8 6 1
d 8 2 4
What function should I use?
Spark SQL explode function is used to create or split an array or map DataFrame columns to rows. Spark defines several flavors of this function; explode_outer – to handle nulls and empty, posexplode – which explodes with a position of element and posexplode_outer – to handle nulls.
explode (col)[source] Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.
You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.
You can use foldLeft
to add each columnn
fron DataArray
make a list of column names that you want to add
val columns = List("col1", "col2", "col3")
columns.zipWithIndex.foldLeft(df) {
(memodDF, column) => {
memodDF.withColumn(column._1, col("dataArray")(column._2))
}
}
.drop("DataArray")
Hope this helps!
Use apply
:
import org.apache.spark.sql.functions.col
df.select(
col("id") +: (0 until 3).map(i => col("DataArray")(i).alias(s"col$i")): _*
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With