Exploding column with index

Tags:

apache-spark-sql

I know that I can "explode" a column of type array like this:

import org.apache.spark.sql._
import org.apache.spark.sql.functions.explode
val explodedDf = 
    payloadLegsDf.withColumn("legs", explode(payloadLegsDf.col("legs")))

Now I have multiple rows; one for each item in the array.

Is there a way I can "explode with index"? So that there will be a new column that contains the index of the item in the original array?

(I can think of hacks to do this. First make the array field into an array of tuples of the original value and the index. Then do the explode. Then unpack the tuples. But is there a more elegant way?)

918

asked Jun 21 '18 16:06

Paul Reiners

1 Answers

If you are using Spark 2.1+, the posexplode function can be used for that:

Creates a new row for each element with position in the given array or map column.

Example:

val df = Seq(
  (1L, Array[String]("a", "b")),
  (2L, Array[String]("c", "d"))
).toDF("id", "items")

val res = df.select($"id", posexplode($"items"))

This will create two new columns, pos for position/index and col for the extracted value:

+---+---+---+
| id|pos|col|
+---+---+---+
|  1|  0|  a|
|  1|  1|  b|
|  2|  0|  c|
|  2|  1|  d|
+---+---+---+

164

answered Oct 05 '22 02:10

Antot

Related questions
                            
                                How does the fold action work in Spark?
                            
                                How to read/write Timestamp in Doobie (Postgres)
                            
                                Functional programming applied
                            
                                A list of scala "global" functions?
                            
                                Algorithm to calculate the number of combinations to form 100
                            
                                How do I setup multiple ORed type bounds in Scala
                            
                                Why are case objects serializable and case classes not?
                            
                                How to handle concurrent access to a Scala collection?
                            
                                Nested Default Maps in Scala
                            
                                Scala named and default arguments in conjunction with implicit parameters
                            
                                Multiple WS call in one action, how to handle Promise objects?
                            
                                String interpolation with triple quotes and multiple lines
                            
                                How to skip javadoc dependency download with sbt
                            
                                Scala - compose function n times
                            
                                Scala: Throw Error vs Return Try?
                            
                                Dynamically compiling scala class files at runtime in Scala 2.11
                            
                                Update case class from incomplete JSON with Argonaut or Circe
                            
                                Error initializing SparkContext: A master URL must be set in your configuration
                            
                                Convert spark dataframe to Array[String]
                            
                                Static methods in interface require -target:jvm-1.8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With