Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance difference in toString.map and toString.toArray.map

While coding Euler problems, I ran across what I think is bizarre:

The method toString.map is slower than toString.toArray.map.

Here's an example:

def main(args: Array[String]) 
{
    def toDigit(num : Int) = num.toString.map(_ - 48) //2137 ms
    def toDigitFast(num : Int) = num.toString.toArray.map(_ - 48) //592 ms

    val startTime = System.currentTimeMillis;

    (1 to 1200000).map(toDigit)

    println(System.currentTimeMillis - startTime)
}

Shouldn't the method map on String fallback to a map over the array? Why is there such a noticeable difference? (Note that increasing the number even causes an stack overflow on the non-array case).

like image 428
Vinicius Seufitele Avatar asked Feb 21 '23 10:02

Vinicius Seufitele


1 Answers

Original

Could be because toString.map uses the WrappedString implicit, while toString.toArray.map uses the WrappedArray implicit to resolve map.

Let's see map, as defined in TraversableLike:

def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
  val b = bf(repr)
  b.sizeHint(this)
  for (x <- this) b += f(x)
  b.result
}

WrappedString uses a StringBuilder as builder:

def +=(x: Char): this.type = { append(x); this }

def append(x: Any): StringBuilder = {
  underlying append String.valueOf(x)
  this
}

The String.valueOf call for Any uses Java Object.toString on the Char instances, possibly getting boxed first. These extra ops might be the cause of speed difference, versus the supposedly shorter code paths of the Array builder.

This is a guess though, would have to measure.

Edit

After revising, the general point still stands, but the I referred the wrong implicits, since the toDigit methods return an Int sequence (or like), not a translated string as I misread.

toDigit uses LowPriorityImplicits.fallbackStringCanBuildFrom[T]: CanBuildFrom[String, T, immutable.IndexedSeq[T]], with T = Int, which just defers to a general IndexedSeq builder.

toDigitFast uses a direct Array implicit of type CanBuildFrom[Array[_], T, Array[T]], which is unarguably faster.

Passing the following CBF for toDigit explicitly makes the two methods on par:

object FastStringToArrayBuild {

  def canBuildFrom[T : ClassManifest] = new CanBuildFrom[String, T, Array[T]] {
    private def newBuilder = scala.collection.mutable.ArrayBuilder.make()
    def apply(from: String) = newBuilder
    def apply() = newBuilder
  }  

}
like image 50
ron Avatar answered Mar 02 '23 01:03

ron