Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to explode an array into multiple columns in Spark

I have a spark dataframe looks like:

id   DataArray
a    array(3,2,1)
b    array(4,2,1)     
c    array(8,6,1)
d    array(8,2,4)

I want to transform this dataframe into:

id  col1  col2  col3
a    3     2     1
b    4     2     1
c    8     6     1 
d    8     2     4

What function should I use?

like image 620
lserlohn Avatar asked Mar 26 '18 19:03

lserlohn


People also ask

How do you explode an array in Spark?

Spark SQL explode function is used to create or split an array or map DataFrame columns to rows. Spark defines several flavors of this function; explode_outer – to handle nulls and empty, posexplode – which explodes with a position of element and posexplode_outer – to handle nulls.

What does explode function do in Spark?

explode (col)[source] Returns a new row for each element in the given array or map. Uses the default column name col for elements in the array and key and value for elements in the map unless specified otherwise.

How do I select multiple columns in Spark?

You can select the single or multiple columns of the Spark DataFrame by passing the column names you wanted to select to the select() function. Since DataFrame is immutable, this creates a new DataFrame with a selected columns. show() function is used to show the DataFrame contents.


2 Answers

You can use foldLeft to add each columnn fron DataArray

make a list of column names that you want to add

val columns = List("col1", "col2", "col3")

columns.zipWithIndex.foldLeft(df) {
  (memodDF, column) => {
    memodDF.withColumn(column._1, col("dataArray")(column._2))
  }
}
  .drop("DataArray")

Hope this helps!

like image 116
koiralo Avatar answered Oct 11 '22 01:10

koiralo


Use apply:

import org.apache.spark.sql.functions.col

df.select(
  col("id") +: (0 until 3).map(i => col("DataArray")(i).alias(s"col$i")): _*
)
like image 23
user9554572 Avatar answered Oct 11 '22 02:10

user9554572