Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast and safe conversion from string to numeric types

Tags:

scala

What would be a fast a safe way to convert a String to a numeric type, while providing a default value when the conversion fails ?

I tried using the usually recommended way, i.e. using Exceptions:

implicit class StringConversion(val s: String) {

  private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
    convert(s)
  } catch {
    case _: NumberFormatException => defaultVal
  }

  def toShortOrElse(defaultVal: Short = 0) = toTypeOrElse[Short](_.toShort, defaultVal)
  def toByteOrElse(defaultVal: Byte = 0) = toTypeOrElse[Byte](_.toByte, defaultVal)
  def toIntOrElse(defaultVal: Int = 0) = toTypeOrElse[Int](_.toInt, defaultVal)
  def toDoubleOrElse(defaultVal: Double = 0D) = toTypeOrElse[Double](_.toDouble, defaultVal)
  def toLongOrElse(defaultVal: Long = 0L) = toTypeOrElse[Long](_.toLong, defaultVal)
  def toFloatOrElse(defaultVal: Float = 0F) = toTypeOrElse[Float](_.toFloat, defaultVal)
}

Using this utility class, I can now easily convert any String to a given numeric type, and provide a default value in case the String is not representing correctly the numeric type:

scala> "123".toIntOrElse()
res1: Int = 123
scala> "abc".toIntOrElse(-1)
res2: Int = -1
scala> "abc".toIntOrElse()
res3: Int = 0
scala> "3.14159".toDoubleOrElse()
res4: Double = 3.14159
...

While it works beautifully, this approach does not seem to scale well, probably because of the Exceptions mechanism:

scala> for (i<-1 to 10000000) "1234".toIntOrElse()

takes roughly 1 second to execute whereas

scala> for (i<-1 to 10000000) "abcd".toIntOrElse()

takes roughly 1 minute!

I guess another approach would be to avoid relying on exceptions being triggered by the toInt, toDouble, ... methods.

Could this be achieved by checking if a String "is of the given type" ? One could of course iterate through the String characters and check that they are digits (see e.g. this example), but then what about the other numeric formats (double, float, hex, octal, ...) ?

like image 465
borck Avatar asked May 08 '14 12:05

borck


1 Answers

As a first approach, filter out those input strings that do not contain any digit

private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
  if (s.contains("[0-9]")) convert(s) {
    else defaultVal
  } catch {
    case _: NumberFormatException => defaultVal
  }
}

Update

Enriched set of characters that may occur in a numeric value, yet no order of occurrence or limits in repetition considered,

private def toTypeOrElse[T](convert: String=>T, defaultVal: T) = try {
    if (s matches "[\\+\\-0-9.e]+") convert(s)
    else defaultVal
  } catch {
    case _: NumberFormatException => defaultVal
  }
}
like image 120
elm Avatar answered Sep 28 '22 07:09

elm