Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark dataframe decimal precision

I have one dataframe:

val groupby = df.groupBy($"column1",$"Date")    
    .agg(sum("amount").as("amount"))
    .orderBy($"column1",desc("cob_date"))

When applyin the window function for adding new column difference:

val windowspec= Window.partitionBy("column1").orderBy(desc("DATE"))

groupby.withColumn("diffrence" ,lead($"amount", 1,0).over(windowspec)).show()


+--------+------------+-----------+--------------------------+
| Column | Date       | Amount    | Difference               |
+--------+------------+-----------+--------------------------+
| A      | 3/31/2017  | 12345.45  | 3456.540000000000000000  |
+--------+------------+-----------+--------------------------+
| A      | 2/28/2017  | 3456.54   | 34289.430000000000000000 |
+--------+------------+-----------+--------------------------+
| A      | 1/31/2017  | 34289.43  | 45673.987000000000000000 |
+--------+------------+-----------+--------------------------+
| A      | 12/31/2016 | 45673.987 | 0.00E+00                 |
+--------+------------+-----------+--------------------------+

I'm getting decimal as with trailing zeros .When I did printSchema() for the above dataframe getting the datatype for difference: decimal(38,18).Can some one tell me how to change the datatype to decimal(38,2) or remove the trailing zeros

like image 471
chinkrishna Avatar asked Jun 19 '26 15:06

chinkrishna


1 Answers

You can cast the data with the specific decimal size like below,

lead($"amount", 1,0).over(windowspec).cast(DataTypes.createDecimalType(32,2))
like image 83
magic Avatar answered Jun 21 '26 07:06

magic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!