Explode multiple columns in Spark SQL table

Tags:

There was a question regarding this issue here:

Explode (transpose?) multiple columns in Spark SQL table

Suppose that we have extra columns as below:

**userId    someString      varA     varB      varC    varD**
   1        "example1"    [0,2,5]   [1,2,9]    [a,b,c] [red,green,yellow]
   2        "example2"    [1,20,5]  [9,null,6] [d,e,f] [white,black,cyan]

To conclude an output like below:

userId    someString      varA     varB   varC     varD
   1      "example1"       0         1     a       red
   1      "example1"       2         2     b       green
   1      "example1"       5         9     c       yellow
   2      "example2"       1         9     d       white
   2      "example2"       20       null   e       black
   2      "example2"       5         6     f       Cyan

The answer was by defining a udf as:

val zip = udf((xs: Seq[Long], ys: Seq[Long]) => xs.zip(ys))

and defining "withColumn".

df.withColumn("vars", explode(zip($"varA", $"varB"))).select(
   $"userId", $"someString",
   $"vars._1".alias("varA"), $"vars._2".alias("varB")).show

If we need to extend the above answer, with more columns, what is the easiest way to amend the above code. Any help please.

860

asked Jul 29 '17 06:07

Mohd Zoubi

1 Answers

The approach with the zip udf seems ok, but you need to extend if for more collections. Unfortunately there is no really nice way to zip 4 Seqs, but this should work:

def assertSameSize(arrs:Seq[_]*) = {
 assert(arrs.map(_.size).distinct.size==1,"sizes differ") 
}

val zip4 = udf((xa:Seq[Long],xb:Seq[Long],xc:Seq[String],xd:Seq[String]) => {
    assertSameSize(xa,xb,xc,xd)
    xa.indices.map(i=> (xa(i),xb(i),xc(i),xd(i)))
  }
)

132

answered Oct 29 '22 10:10

Raphael Roth

Related questions
                            
                                Grouping list items by comparing them with their neighbors
                            
                                Scala Future vs Thread for a long running task without result
                            
                                Scala MurmurHash3 library not matching Python mmh3 library
                            
                                Using Akka Http for Multiple Bindings
                            
                                How to create schema Array in data frame with spark
                            
                                Scala factory for generic types using the apply method?
                            
                                Transverse a tree like object in a Tail recursive way in scala
                            
                                Scala case class copy constructor with dynamic fields
                            
                                sealed case classes in flow
                            
                                Akka Streams split stream by type
                            
                                shapeless convert case class to HList and skip all option fields
                            
                                Mark existing variable as candiate for implicit method
                            
                                Meaning of `A >: Null`?
                            
                                Spark Streaming - Batch Interval vs Processing time
                            
                                Inline Documentation displays 'Cannot find macro' for List.head()
                            
                                Spark's toDS vs to DF
                            
                                Broadcast Hash Join (BHJ) in Spark for full outer join (outer, full, fullouter)
                            
                                How does HList.foldRight look for implicits when used in the implementation of a type class?
                            
                                Is ScalaCheck's Gen.pick really random?
                            
                                Understanding monad transformers in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Explode multiple columns in Spark SQL table

Tags:

scala

apache-spark

explode

user-defined-functions

Mohd Zoubi

People also ask

1 Answers

Raphael Roth

Recent Activity

Donate For Us