Spark decimal type precision loss

Question

I'm doing some testing of spark decimal types for currency measures and am seeing some odd precision results when I set the scale and precision as shown below. I want to be sure that I won't have any data loss during calculations but the example below is not reassuring of that. Can anyone tell me why this is happening with spark sql? Currently on version 2.3.0

val sql = """select cast(cast(3 as decimal(38,14)) / cast(9 as decimal(38,14)) as decimal(38,14)) val"""
spark.sql(sql).show

This returns

+----------------+
|             val|
+----------------+
|0.33333300000000|
+----------------+

Jared · Accepted Answer

This is a current open issue, see SPARK-27089. The suggested work around is to adjust the setting below. I validated that the SQL statement works as expected with this setting set to false.

spark.sql.decimalOperations.allowPrecisionLoss=false

C.S.Reddy Gadipally · Answer

Use BigDecimal to avoid precision loss. See Double vs. BigDecimal?

example:

scala> val df = Seq(BigDecimal("0.03"),BigDecimal("8.20"),BigDecimal("0.02")).toDS
df: org.apache.spark.sql.Dataset[scala.math.BigDecimal] = [value: decimal(38,18)]

scala> df.select($"value").show
+--------------------+
|               value|
+--------------------+
|0.030000000000000000|
|8.200000000000000000|
|0.020000000000000000|
+--------------------+

Using BigDecimal:

scala> df.select($"value" + BigDecimal("0.1")).show
+-------------------+
|      (value + 0.1)|
+-------------------+
|0.13000000000000000|
|8.30000000000000000|
|0.12000000000000000|
+-------------------+

if you don't use BigDecimal, there will be a loss in precision. In this case 0.1 is a double

scala> df.select($"value" +  lit(0.1)).show
+-------------------+
|      (value + 0.1)|
+-------------------+
|               0.13|
|  8.299999999999999|
|0.12000000000000001|
+-------------------+

Spark decimal type precision loss

Tags:

scala

apache-spark

apache-spark-sql

Jared

2 Answers

Jared

C.S.Reddy Gadipally

Recent Activity

Donate For Us

Spark decimal type precision loss

Tags:

scala

apache-spark

apache-spark-sql

Jared

2 Answers

Jared

C.S.Reddy Gadipally

Related questions

Recent Activity

Donate For Us