I have a DataFrame formated as below:
+---+------------------------------------------------------+
|Id |DateInfos |
+---+------------------------------------------------------+
|B |[[3, 19/06/2012-02.42.01], [4, 17/06/2012-18.22.21]] |
|A |[[1, 15/06/2012-18.22.16], [2, 15/06/2012-09.22.35]] |
|C |[[5, 14/06/2012-05.20.01]] |
+---+------------------------------------------------------+
I would like to sort each element of DateInfos column by date with the timestamp in the second element of my Array
+---+------------------------------------------------------+
|Id |DateInfos |
+---+------------------------------------------------------+
|B |[[4, 17/06/2012-18.22.21], [3, 19/06/2012-02.42.01]] |
|A |[[2, 15/06/2012-09.22.35], [1, 15/06/2012-18.22.16]] |
|C |[[5, 14/06/2012-05.20.01]] |
+---+------------------------------------------------------+
the schema of my DataFrame is printed as below:
root
|-- C1: string (nullable = true)
|-- C2: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: integer (nullable = false)
| | |-- _2: string (nullable = false)
I assume I have to create an udf which use a function with the following signature:
def sort_by_date(mouvements : Array[Any]) : Array[Any]
Do you have any idea?
That's indeed a bit tricky - because although the UDF's input and output types seem identical, we can't really define it that way - because the input is actually a mutable.WrappedArray[Row]
and the output can't use Row
or else Spark will fail to decode it into a Row...
So we define a UDF that takes a mutable.WrappedArray[Row]
and returns an Array[(Int, String)]
:
val sortDates = udf { arr: mutable.WrappedArray[Row] =>
arr.map { case Row(i: Int, s: String) => (i, s) }.sortBy(_._2)
}
val result = input.select($"Id", sortDates($"DateInfos") as "DateInfos")
result.show(truncate = false)
// +---+--------------------------------------------------+
// |Id |DateInfos |
// +---+--------------------------------------------------+
// |B |[[4,17/06/2012-18.22.21], [3,19/06/2012-02.42.01]]|
// |A |[[2,15/06/2012-09.22.35], [1,15/06/2012-18.22.16]]|
// |C |[[5,14/06/2012-05.20.01]] |
// +---+--------------------------------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With