Let's assume that I have the following Dataset:
+-----------+----------+
|productCode| amount|
+-----------+----------+
| XX-13| 300|
| XX-1| 250|
| XX-2| 410|
| XX-9| 50|
| XX-10| 35|
| XX-100| 870|
+-----------+----------+
Where productCode is of String type and the amount is an Int.
If one will try to order this by productCode the result will be (and this is expected because of nature of String comparison):
def orderProducts(product: Dataset[Product]): Dataset[Product] = {
product.orderBy("productCode")
}
// Output:
+-----------+----------+
|productCode| amount|
+-----------+----------+
| XX-1| 250|
| XX-10| 35|
| XX-100| 870|
| XX-13| 300|
| XX-2| 410|
| XX-9| 50|
+-----------+----------+
How can I get an output ordered by Integer part of the productCode like below considering Dataset API?
+-----------+----------+
|productCode| amount|
+-----------+----------+
| XX-1| 250|
| XX-2| 410|
| XX-9| 50|
| XX-10| 35|
| XX-13| 300|
| XX-100| 870|
+-----------+----------+
Use the expression in the orderBy. Check this out:
scala> val df = Seq(("XX-13",300),("XX-1",250),("XX-2",410),("XX-9",50),("XX-10",35),("XX-100",870)).toDF("productCode", "amt")
df: org.apache.spark.sql.DataFrame = [productCode: string, amt: int]
scala> df.orderBy(split('productCode,"-")(1).cast("int")).show
+-----------+---+
|productCode|amt|
+-----------+---+
| XX-1|250|
| XX-2|410|
| XX-9| 50|
| XX-10| 35|
| XX-13|300|
| XX-100|870|
+-----------+---+
scala>
With window functions, you could do like
scala> df.withColumn("row1",row_number().over(Window.orderBy(split('productCode,"-")(1).cast("int")))).show(false)
18/12/10 09:25:07 WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
+-----------+---+----+
|productCode|amt|row1|
+-----------+---+----+
|XX-1 |250|1 |
|XX-2 |410|2 |
|XX-9 |50 |3 |
|XX-10 |35 |4 |
|XX-13 |300|5 |
|XX-100 |870|6 |
+-----------+---+----+
scala>
Note that spark complains of moving all data to single partition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With