Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing Clojure Ratios is slow

I'm summing a long list of Ratios in Clojure, something like:

(defn sum-ratios
  [n]
  (reduce
    (fn [total ind]
      (+
        total
        (/
          (inc (rand-int 100))
          (inc (rand-int 100)))))
    (range 0 n)))

The runtime for various n is:

  • n = 10^4 ...... 41 ms
  • n = 10^6 ...... 3.4 s
  • n = 10^7 ...... 36 s


The (less precise) alternative is to sum these values as doubles:

(defn sum-doubles
  [n]
  (reduce
    (fn [total ind]
      (+
        total
        (double
          (/
            (inc (rand-int 100))
            (inc (rand-int 100))))))
    (range 0 n)))

The runtime for this version is:

  • n = 10^4 ...... 8.8 ms
  • n = 10^6 ...... 350 ms
  • n = 10^7 ...... 3.4 s


Why is it significantly slower to sum Ratios? I'm guessing that it has to do with finding the least common multiple of the denominators of the Ratios being summed, but does anybody know specifically which algorithm Clojure uses to sum Ratios?

like image 769
bslawski Avatar asked Jun 26 '17 23:06

bslawski


1 Answers

Let's follow what happens when you + two Ratios, which is what happens at each step in the reduction. We start at the two-arity version of +:

([x y] (. clojure.lang.Numbers (add x y)))

This takes us to Numbers.add(Obj, Obj):

return ops(x).combine(ops(y)).add((Number)x, (Number)y);

ops looks at the class of the first operand and will find RatioOps. This leads to the RatioOps.add function:

final public Number add(Number x, Number y){
    Ratio rx = toRatio(x);
    Ratio ry = toRatio(y);
    Number ret = divide(ry.numerator.multiply(rx.denominator)
            .add(rx.numerator.multiply(ry.denominator))
            , ry.denominator.multiply(rx.denominator));
    return normalizeRet(ret, x, y);
}

So here's your algorithm. There are five BigInteger operations here (three multiplies, one add, one divide):

(yn*xd + xn*yd) / (xd*yd)

You can see how multiply is implemented; it alone is not trivial, and you can examine the others for yourself.

Sure enough, the divide function involves finding the gcd between the two numbers so it can be reduced:

static public Number divide(BigInteger n, BigInteger d){
    if(d.equals(BigInteger.ZERO))
        throw new ArithmeticException("Divide by zero");
    BigInteger gcd = n.gcd(d);
    if(gcd.equals(BigInteger.ZERO))
        return BigInt.ZERO;
    n = n.divide(gcd);
    d = d.divide(gcd);
    ...
}

The gcd function creates two new MutableBigInteger objects.

Computationally, it's expensive, as you can see from all of the above. However, don't discount the cost of extra incidental object creation (as in gcd above), as that is often more expensive as we involve non-cached memory access.

The double conversion is not free, FWIW, as it involves a division of two newly-created BigDecimals.

You really need a profiler to see exactly where the cost is. But hopefully the above gives a little context.

like image 89
Josh Avatar answered Nov 15 '22 09:11

Josh