Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exploding column with index

I know that I can "explode" a column of type array like this:

import org.apache.spark.sql._
import org.apache.spark.sql.functions.explode
val explodedDf = 
    payloadLegsDf.withColumn("legs", explode(payloadLegsDf.col("legs")))

Now I have multiple rows; one for each item in the array.

Is there a way I can "explode with index"? So that there will be a new column that contains the index of the item in the original array?

(I can think of hacks to do this. First make the array field into an array of tuples of the original value and the index. Then do the explode. Then unpack the tuples. But is there a more elegant way?)

like image 918
Paul Reiners Avatar asked Jun 21 '18 16:06

Paul Reiners


People also ask

How do you explode a column in SQL?

You could explode on the semicolon, then implode with a comma and use a WHERE col IN() clause in your query.

How do you explode a column in a DataFrame?

DataFrame - explode() function The explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.

How do you explode an array in SQL?

If EXPLODE is applied on an instance of SQL. ARRAY <T>, the resulting rowset contains a single column of type T where each item in the array is placed into its own row. If the array value was empty or null, then the resulting rowset is empty. If EXPLODE is applied on an instance of SQL.


1 Answers

If you are using Spark 2.1+, the posexplode function can be used for that:

Creates a new row for each element with position in the given array or map column.

Example:

val df = Seq(
  (1L, Array[String]("a", "b")),
  (2L, Array[String]("c", "d"))
).toDF("id", "items")

val res = df.select($"id", posexplode($"items"))

This will create two new columns, pos for position/index and col for the extracted value:

+---+---+---+
| id|pos|col|
+---+---+---+
|  1|  0|  a|
|  1|  1|  b|
|  2|  0|  c|
|  2|  1|  d|
+---+---+---+
like image 164
Antot Avatar answered Oct 05 '22 02:10

Antot