Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala.math.BigDecimal : 1.2 and 1.20 are equal

How to keep precision and the trailing zero while converting a Double or a String to scala.math.BigDecimal ?

Use Case - In a JSON message, an attribute is of type String and has a value of "1.20". But while reading this attribute in Scala and converting it to a BigDecimal, I am loosing the precision and it is converted to 1.2

Scala REPL screenshot

like image 533
Saurabh Avatar asked Dec 10 '22 01:12

Saurabh


2 Answers

@Saurabh What a nice question! It is crucial that you shared the use case!

I think my answer lets to solve it in a most safe and efficient way... In a short form it is:

Use jsoniter-scala for parsing BigDecimal values precisely.

Encoding/decoding to/from JSON strings for any numeric type can by defined per codec or per class field basis. Please see code bellow:

Add dependencies into your build.sbt:

libraryDependencies ++= Seq(
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core"   % "2.17.4",
  "com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "2.17.4" % Provided // required only in compile-time
)

Define data structures, derive a codec for the root structure, parse the response body and serialize it back:

import com.github.plokhotnyuk.jsoniter_scala.core._
import com.github.plokhotnyuk.jsoniter_scala.macros._

case class Response(
  amount: BigDecimal,
  @stringified price: BigDecimal)
    
implicit val codec: JsonValueCodec[Response] = JsonCodecMaker.make {
  CodecMakerConfig
    .withIsStringified(true) // switch it on to stringify all numeric and boolean values in this codec
    .withBigDecimalPrecision(34) // set a precision to round up to decimal128 format: java.math.MathContext.DECIMAL128.getPrecision
    .withBigDecimalScaleLimit(6178) // limit scale to fit the decimal128 format: BigDecimal("0." + "0" * 33 + "1e-6143", java.math.MathContext.DECIMAL128).scale + 1
    .withBigDecimalDigitsLimit(308) // limit a number of mantissa digits to be parsed before rounding with the specified precision
}
  
val response = readFromArray("""{"amount":1000,"price":"1.20"}""".getBytes("UTF-8"))
val json = writeToArray(Response(amount = BigDecimal(1000), price = BigDecimal("1.20")))

Print results to the console and verify them:

println(response)
println(new String(json, "UTF-8"))

Response(1000,1.20)
{"amount":1000,"price":"1.20"}   

Why the proposed approach is safe?

Well... Parsing of JSON is a minefield, especially when you are going to have precise BigDecimal values after that. Most JSON parsers for Scala do it using Java's constructor for string representation which has O(n^2) complexity (where n is a number of digits in the mantissa) and do not round results to the safe option of MathContext (by default the MathContext.DECIMAL128 value is used for that in Scala's BigDecimal constructors and operations).

It introduces vulnerabilities under low bandwidth DoS/DoW attacks for systems that accept untrusted input. Below is a simple example how it can be reproduced in Scala REPL with the latest version of the most popular JSON parser for Scala in the classpath:

...
Starting scala interpreter...
Welcome to Scala 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222).
Type in expressions for evaluation. Or try :help.

scala> def timed[A](f: => A): A = { val t = System.currentTimeMillis; val r = f; println(s"Elapsed time (ms): ${System.currentTimeMillis - t}"); r } 
timed: [A](f: => A)A

scala> timed(io.circe.parser.decode[BigDecimal]("9" * 1000000))
Elapsed time (ms): 29192
res0: Either[io.circe.Error,BigDecimal] = Right(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999...

scala> timed(io.circe.parser.decode[BigDecimal]("1e-100000000").right.get + 1)
Elapsed time (ms): 87185
res1: scala.math.BigDecimal = 1.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000...

For contemporary 1Gbit networks 10ms of receiving a malicious message with the 1M-digit number can produce 29 seconds of 100% CPU load on a single core. More than 256 cores can be effectively DoS-ed at the full bandwidth rate. The last expression demonstrates how to burn a CPU core for ~1.5 minutes using a message with a 13-byte number if subsequent + or - operations were used with Scala 2.12.8.

And, jsoniter-scala take care about all these cases for Scala 2.11.x, 2.12.x, 2.13.x, and 3.x.

Why it is the most efficient?

Below are charts with throughput (operations per second, so greater is better) results of JSON parsers for Scala on different JVMs during parsing of an array of 128 small (up to 34-digit mantissas) values and a medium (with a 128-digit mantissa) value of BigDecimal accordingly:

enter image description here

enter image description here

The parsing routine for BigDecimal in jsoniter-scala:

  • uses BigDecimal values with compact representation for small numbers up to 36 digits

  • uses more efficient hot-loops for medium numbers that have from 37 to 284 digits

  • switches to the recursive algorithm which has O(n^1.5) complexity for values that have more than 285 digits

Moreover, jsoniter-scala parses and serializes JSON directly from UTF-8 bytes to your data structures and back, and does it crazily fast without using of run-time reflection, intermediate ASTs, strings or hash maps, with minimum allocations and copying. Please see here the results of 115 benchmarks for different data types and real-life message samples for GeoJSON, Google Maps API, OpenRTB, and Twitter API.

like image 136
Andriy Plokhotnyuk Avatar answered Dec 23 '22 11:12

Andriy Plokhotnyuk


For Double, 1.20 is exactly the same as 1.2, so you can't convert them to different BigDecimals. For String, you are not losing precision; you can see that because res3: scala.math.BigDecimal = 1.20 and not ... = 1.2! But equals on scala.math.BigDecimal happens to be defined so that numerically equal BigDecimals are equal even though they are distinguishable.

If you want to avoid that, you could use java.math.BigDecimals for which

Unlike compareTo, this method considers two BigDecimal objects equal only if they are equal in value and scale (thus 2.0 is not equal to 2.00 when compared by this method).

For your case, res2.underlying == res3.underlying will be false.

Of course, its documentation also states

Note: care should be exercised if BigDecimal objects are used as keys in a SortedMap or elements in a SortedSet since BigDecimal's natural ordering is inconsistent with equals. See Comparable, SortedMap or SortedSet for more information.

which is probably part of the reason why the Scala designers decided on different behavior.

like image 33
Alexey Romanov Avatar answered Dec 23 '22 10:12

Alexey Romanov