I'm doing some testing of spark decimal types for currency measures and am seeing some odd precision results when I set the scale and precision as shown below. I want to be sure that I won't have any data loss during calculations but the example below is not reassuring of that. Can anyone tell me why this is happening with spark sql? Currently on version 2.3.0
val sql = """select cast(cast(3 as decimal(38,14)) / cast(9 as decimal(38,14)) as decimal(38,14)) val"""
spark.sql(sql).show
This returns
+----------------+
| val|
+----------------+
|0.33333300000000|
+----------------+
This is a current open issue, see SPARK-27089. The suggested work around is to adjust the setting below. I validated that the SQL statement works as expected with this setting set to false.
spark.sql.decimalOperations.allowPrecisionLoss=false
Use BigDecimal to avoid precision loss. See Double vs. BigDecimal?
example:
scala> val df = Seq(BigDecimal("0.03"),BigDecimal("8.20"),BigDecimal("0.02")).toDS
df: org.apache.spark.sql.Dataset[scala.math.BigDecimal] = [value: decimal(38,18)]
scala> df.select($"value").show
+--------------------+
| value|
+--------------------+
|0.030000000000000000|
|8.200000000000000000|
|0.020000000000000000|
+--------------------+
Using BigDecimal:
scala> df.select($"value" + BigDecimal("0.1")).show
+-------------------+
| (value + 0.1)|
+-------------------+
|0.13000000000000000|
|8.30000000000000000|
|0.12000000000000000|
+-------------------+
if you don't use BigDecimal, there will be a loss in precision. In this case 0.1 is a double
scala> df.select($"value" + lit(0.1)).show
+-------------------+
| (value + 0.1)|
+-------------------+
| 0.13|
| 8.299999999999999|
|0.12000000000000001|
+-------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With