Unpivot in spark-sql/pyspark

Tags:

I have a problem statement at hand wherein I want to unpivot table in spark-sql/pyspark. I have gone through the documentation and I could see there is support only for pivot but no support for un-pivot so far. Is there a way I can achieve this?

Let my initial table look like this:

Let my initial table look like this

when I pivot this in pyspark using below mentioned command:

df.groupBy("A").pivot("B").sum("C")

I get this as the output:

After pivot table looks like this

Now I want to unpivot the pivoted table. In general this operation may/may not yield the original table based on how I've pivoted the original table.

Spark-sql as of now doesn't provide out of the box support for unpivot. Is there a way I can achieve this?

665

asked Feb 26 '17 06:02

Manish Mehra

1 Answers

You can use the built in stack function, for example in Scala:

scala> val df = Seq(("G",Some(4),2,None),("H",None,4,Some(5))).toDF("A","X","Y", "Z") df: org.apache.spark.sql.DataFrame = [A: string, X: int ... 2 more fields]  scala> df.show +---+----+---+----+ |  A|   X|  Y|   Z| +---+----+---+----+ |  G|   4|  2|null| |  H|null|  4|   5| +---+----+---+----+   scala> df.select($"A", expr("stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)")).where("C is not null").show +---+---+---+ |  A|  B|  C| +---+---+---+ |  G|  X|  4| |  G|  Y|  2| |  H|  Y|  4| |  H|  Z|  5| +---+---+---+

Or in pyspark:

In [1]: df = spark.createDataFrame([("G",4,2,None),("H",None,4,5)],list("AXYZ"))  In [2]: df.show() +---+----+---+----+ |  A|   X|  Y|   Z| +---+----+---+----+ |  G|   4|  2|null| |  H|null|  4|   5| +---+----+---+----+  In [3]: df.selectExpr("A", "stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)").where("C is not null").show() +---+---+---+ |  A|  B|  C| +---+---+---+ |  G|  X|  4| |  G|  Y|  2| |  H|  Y|  4| |  H|  Z|  5| +---+---+---+

109

answered Sep 17 '22 21:09

Andrew Ray

Related questions
                            
                                Angular CLI ERROR in Cannot read property 'loadChildren' of null
                            
                                Ngrx: combine two selectors
                            
                                Pandas count null values in a groupby function
                            
                                Command line connection string for EF core database update
                            
                                Redux form defaultValue
                            
                                google mock - can I call EXPECT_CALL multiple times on same mock object?
                            
                                Launching Explorer from WSL
                            
                                Why does abstract class have to implement all methods from interface?
                            
                                subprocess "TypeError: a bytes-like object is required, not 'str'"
                            
                                Vue-Router Passing Data with Props
                            
                                How to get mouse coordinates in VueJS
                            
                                What is the difference between Int and Integer in Kotlin?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With