I have a problem statement at hand wherein I want to unpivot table in spark-sql/pyspark. I have gone through the documentation and I could see there is support only for pivot but no support for un-pivot so far. Is there a way I can achieve this?
Let my initial table look like this:
when I pivot this in pyspark using below mentioned command:
df.groupBy("A").pivot("B").sum("C")
I get this as the output:
Now I want to unpivot the pivoted table. In general this operation may/may not yield the original table based on how I've pivoted the original table.
Spark-sql as of now doesn't provide out of the box support for unpivot. Is there a way I can achieve this?
Spark SQL provides pivot() function to rotate the data from one column into multiple columns (transpose row to column). It is an aggregation where one of the grouping columns values transposed into individual columns with distinct data.
The PySpark pivot is used for the rotation of data from one Data Frame column into multiple columns. It is an aggregation function that is used for the rotation of data from one column to multiple columns in PySpark. This improves the performance of data and, conventionally, is a cheaper approach for data analysis.
You can use the built in stack function, for example in Scala:
scala> val df = Seq(("G",Some(4),2,None),("H",None,4,Some(5))).toDF("A","X","Y", "Z") df: org.apache.spark.sql.DataFrame = [A: string, X: int ... 2 more fields] scala> df.show +---+----+---+----+ | A| X| Y| Z| +---+----+---+----+ | G| 4| 2|null| | H|null| 4| 5| +---+----+---+----+ scala> df.select($"A", expr("stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)")).where("C is not null").show +---+---+---+ | A| B| C| +---+---+---+ | G| X| 4| | G| Y| 2| | H| Y| 4| | H| Z| 5| +---+---+---+
Or in pyspark:
In [1]: df = spark.createDataFrame([("G",4,2,None),("H",None,4,5)],list("AXYZ")) In [2]: df.show() +---+----+---+----+ | A| X| Y| Z| +---+----+---+----+ | G| 4| 2|null| | H|null| 4| 5| +---+----+---+----+ In [3]: df.selectExpr("A", "stack(3, 'X', X, 'Y', Y, 'Z', Z) as (B, C)").where("C is not null").show() +---+---+---+ | A| B| C| +---+---+---+ | G| X| 4| | G| Y| 2| | H| Y| 4| | H| Z| 5| +---+---+---+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With