How can I improve the performance of my Julia program for excellent numbers?

Question

I've been playing with some Perl programs to calculate excellent numbers. Although the runtimes for my solutions were acceptable, I thought a different language, especially one designed for numeric stuff, might be faster. A friend suggested Julia, but the performance I'm seeing is so bad I must be doing something wrong. I've looked through the Performance Tips and don't see what I should improve:

digits = int( ARGS[1] )

const k = div( digits, 2 )

for a = ( 10 ^ (k - 1) ) : ( 10 ^ (k) - 1 )
    front = a * (10 ^ k + a)

    root = floor( front ^ 0.5 )

    for b = ( root - 1 ): ( root + 1 )
        back = b * (b - 1);
        if back > front
            break
        end
        if log(10,b) > k
            continue
        end
        if front == back
            @printf "%d%d
" a b
        end
    end
 end

I have an equivalent C program that's an order of magnitude faster instead of the factor of 2 noted on the Julia page (although most of the Stackoverflow questions about Julia's speed seem to point out flawed benchmarks from that page):

And the non-optimized pure Perl I wrote takes half as long:

use v5.20;

my $digits = $ARGV[0] // 2;
die "Number of digits must be even and non-zero! You said [$digits]
"
    unless( $digits > 0 and $digits % 2 == 0 and int($digits) eq $digits );

my $k  = ( $digits / 2 );

foreach my $n ( 10**($k-1) .. 10**($k) - 1 ) {
    my $front = $n*(10**$k + $n);
    my $root = int( sqrt( $front ) );

    foreach my $try ( $root - 2 .. $root + 2 ) {
        my $back = $try * ($try - 1);
        last if length($try) > $k;
        last if $back > $front;
        # say "	n: $n back: $back try: $try front: $front";
        if( $back == $front ) {
            say "$n$try";
            last;
            }
        }
    }

I'm using the pre-compiled Julia for Mac OS X since I couldn't get the source to compile (but I didn't try beyond it blowing up the first time). I figure that's part of it.

Also, I see about a 0.7 second start up time for any Julia program (see Slow Julia Startup Time), which means the equivalent compiled C program can run about 200 times before Julia finishes once. As the runtime increases (bigger values of digits) and the startup time means less, my Julia program is still really slow.

I haven't gotten to the part for very large numbers (20+ digit excellent numbers) which I didn't realize Julia doesn't handle those any better than most other languages.

Here's my C code, which is a little different from when I started this. My rusty, inelegant C skills are essentially the same thing as my Perl.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

int main( int argc, char *argv[] ) {
    long 
        k, digits,
        start, end,
        a, b,
        front, back,
        root
        ;

    digits = atoi( argv[1] );

    k = digits / 2;

    start = (long) pow(10, k - 1);
    end   = (long) pow(10, k);

    for( a = start; a < end; a++ ) {
        front = (long) a * ( pow(10,k) + a );
        root  = (long) floor( sqrt( front ) );

        for( b = root - 1; b <= root + 1; b++ ) {
            back = (long) b * ( b - 1 );
            if( back > front )  { break; }
            if( log10(b) > k ) { continue; }
            if( front == back ) {
                printf( "%ld%ld
", a, b );
                }
            }
        }

    return 0;
    }

IainDunning · Accepted Answer

I've benchmarked your code (brian.jl) against the following code, which attempts to make minimal changes to your code and follows Julian style:

function excellent(digits)
    k = div(digits, 2)
    l = 10 ^ (k - 1)
    u = (10 ^ k) - 1

    for a in l:u
        front = a * (10 ^ k + a)
        root = isqrt(front)
        for b = (root - 1):(root + 1)
            back = b * (b - 1)
            back > front && break
            log10(b) > k && continue
            front == back && println(a,b)
        end
    end
end
excellent(int(ARGS[1]))

Separating out u and l was a personal preference for readability. As a baseline, the Julia startup time on my machine is:

$ time julia -e ''
real    0m0.248s
user    0m0.306s
sys 0m0.091s

So if the computation you are running per execution of Julia from a cold start is on the order of 0.3 seconds, then Julia might not be a great choice for you at this stage. I passed in 16 to the scripts, and got:

$ time julia brian.jl 16
1045751633986928
1140820035650625
3333333466666668

real    0m15.973s
user    0m15.691s
sys     0m0.586s

and

$ time julia iain.jl 16
1045751633986928
1140820035650625
3333333466666668

real    0m9.691s
user    0m9.839s
sys     0m0.155s

A limitation of this code as written is that if digits>=20 we will exceed the storage of a Int64. Julia, for performance reasons, doesn't auto-promote integer types to arbitrary precision integers. We can use our knowledge of the problem to address this by changing the last line to:

digits = int(ARGS[1])
excellent(digits >= 20 ? BigInt(digits) : digits)

We get the BigInt version of excellent for free, which is nice. Ignoring that for now, on profiling my version I found that ~74% of the time is spent calculating log10, following by ~19% on isqrt. I did this profiling by replacing the last line with

excellent(4)  # Warm up to avoid effects of JIT
@profile excellent(int(ARGS[1]))
Profile.print()

Now if we wanted to dabble in minor algorithmic changes, given what we know now from the profiler, we can replace the log10 line (which is just checking the number of digits in effect) with ndigits(b) > k && continue, which gives us

$ time julia iain.jl 16
1045751633986928
1140820035650625
3333333466666668

real    0m3.634s
user    0m3.785s
sys     0m0.153s

This changes the balance to about ~56% from the isqrt and ~28% from ndigits. Digging further into that 56%, about half is spent executing this line which seems like a pretty sensible algorithm, so any improvement would probably change the spirit of the comparison as it'd really be a different approach altogether. Investigating the machine code with @code_native tends to suggest there is nothing else too strange going on, although I didn't dig deep into this.

If I allow myself to engage in some more minor algorithmic improvements, I can start from root+1 and only doing the ndigits check once, i.e.

for a in l:u
    front = a * (10^k + a)
    root = isqrt(front)
    b = root + 1
    ndigits(b) > k && continue
    front == b*(b-1) && println(a,b)
    b = root
    front == b*(b-1) && println(a,b)
    b = root - 1
    front == b*(b-1) && println(a,b)
end

which gets me down to

real    0m2.901s
user    0m3.050s
sys     0m0.154s

(I'm not convinced the second two equality checks are needed, but I'm trying to minimize differences!). Finally I thought I'd eke out some extra speed by precomputing 10^k, i.e. k10 = 10^k, which seems to be computed freshly each iteration. With that, I get to

real    0m2.518s
user    0m2.670s
sys     0m0.153s

Which is a fairly nice 20x improvement over the original code.

StefanKarpinski · Answer

I was curious about how Perl was getting such good performance from this code, so I felt I ought to do a comparison. Since there are some seemingly needless differences in both control flow and operations between the Perl and Julia versions of the code in the question, I ported each version to the other language and compared all four. I also wrote a fifth Julia version using more idiomatic numerical functions but with the same control flow structure as the question's Perl version.

The first variant is essentially the Perl code from the question, but wrapped up in a function:

sub perl1 {
    my $k = $_[0];
    foreach my $n (10**($k-1) .. 10**($k)-1) {
        my $front = $n * (10**$k + $n);
        my $root = int(sqrt($front));

        foreach my $t ($root-2 .. $root+2) {
            my $back = $t * ($t - 1);
            last if length($t) > $k;
            last if $back > $front;
            if ($back == $front) {
                print STDERR "$n$t
";
                last;
            }
        }
    }
}

Next, I translated this to Julia, keeping the same control flow and using the same operations – it takes the integer floor of the square root of front in the outer loop and takes the length of the "stringification" of t in the inner loop:

function julia1(k)
    for n = 10^(k-1):10^k-1
        front = n*(10^k + n)
        root = floor(Int,sqrt(front))

        for t = root-2:root+2
            back = t * (t - 1)
            length(string(t)) > k && break
            back > front && break
            if back == front
                println(STDERR,n,t)
                break
            end
        end
    end
end

Here's the question's Julia code with some minor formatting tweaks, wrapped up in a function:

function julia2(k)
    for a = 10^(k-1):10^k-1
        front = a * (10^k + a)
        root = floor(front^0.5)

        for b = root-1:root+1
            back = b * (b - 1);
            back > front && break
            log(10,b) > k && continue
            if front == back
                @printf STDERR "%d%d
" a b
                # missing break?
            end
        end
    end
end

I translated this back to Perl, keeping the same control flow structure and using the same operations as the Perl code – taking floor of root raised to the 0.5 power in the outer loop, and taking the logarithm base 10 in the inner loop:

sub perl2 {
    my $k = $_[0];
    foreach my $a (10**($k-1) .. 10**($k)-1) {
        my $front = $a * (10**$k + $a);
        my $root = int($front**0.5);

        foreach my $b ($root-1 .. $root+1) {
            my $back = $b * ($b - 1);
            last if $back > $front;
            next if log($b)/log(10) > $k;
            if ($front == $back) {
                print STDERR "$a$b
"
            }
        }
    }
}

Finally, I wrote a Julia version that has the same control flow structure as the question's Perl version but uses more idiomatic numerical operations – the isqrt and ndigits functions:

function julia3(k)
    for n = 10^(k-1):10^k-1
        front = n*(10^k + n)
        root = isqrt(front)

        for t = root-2:root+2
            back = t * (t - 1)
            ndigits(t) > k && break
            back > front && break
            if back == front
                println(STDERR,n,t)
                break
            end
        end
    end
end

As far as I know (I used to do a lot of Perl programming, but it's been a while), there aren't Perl versions of either of these operations, so there's no corresponding perl3 variant.

I ran all five variations with Perl 5.18.2 and Julia 0.3.9, respectively, ten times each for 2, 4, 6, 8, 10, 12 and 14 digits. Here are the timing results:

median execution time in seconds vs. digits

The x axis is the number of digits requested. The y axis is median time in seconds required to compute each function. The y axis is plotted on a log scale (there's some rendering bug in the Cairo backend of Gadfly, so the superscripts don't appear very raised). We can see that except for the very smallest number of digits (2), all three Julia variants are faster than both Perl variants – and julia3 is substantially faster than everything else. How much faster? Here's a comparison of the other four variants relative to julia3 (not log scale):

time relative to julia3 vs. digits

The x axis is the number of digits requested again, while the y axis is how many times slower each variant was than julia3. As you can see here, I was unable to reproduce the Perl performance claimed in the question – the Perl code was not 2x faster than Julia – it was 7 to 40 times slower than julia3 and at least 2x slower than the slowest Julia variant for any non-trivial number of digits. I didn't test with Perl 5.20 – perhaps someone could follow up by running these benchmarks with a newer Perl and see if that explains the different results? Code to run the benchmarks can be found here: excellent.pl, excellent.jl. I ran the them like this:

cat /dev/null >excellent.csv
for d in 2 4 6 8 10 12 14; do
    perl excellent.pl $d >>excellent.csv
    julia excellent.jl $d >>excellent.csv
done

I analyzed the resulting excellent.csv file using this Julia script.

Finally, as has been mentioned in the comments, using BigInt or Int128 is an option for exploring larger excellent numbers in Julia. However, this requires a little care for writing the algorithm generically. Here's a fourth variation that works generically:

function julia4(k)
    ten = oftype(k,10)
    for n = ten^(k-1):ten^k-1
        front = n*(ten^k + n)
        root = isqrt(front)

        for t = root-2:root+2
            back = t * (t - 1)
            ndigits(t) > k && break
            back > front && break
            if back == front
                println(STDERR,n,t)
                break
            end
        end
    end
end

This is the same as julia3 but works for generic integer types by converting 10 to the type of its argument. Since the algorithm scales exponentially, however, it still takes a very long time to compute for any number of digits much larger than 14:

julia> @time julia4(int128(10)) # digits = 20
21733880705143685100
22847252005297850625
23037747345324014028
23921499005444619376
24981063345587629068
26396551105776186476
31698125906461101900
33333333346666666668
34683468346834683468
35020266906876369525
36160444847016852753
36412684107047802476
46399675808241903600
46401324208242096401
48179452108449381525
elapsed time: 2260.27479767 seconds (5144 bytes allocated)

It works, but 37 minutes is kind of a long time to wait. Using a faster programming language only gets you a constant factor speedup – 40x in this case – but that only buys a couple of extra digits. To really explore larger excellent numbers, you'll need to look into better algorithms.

How can I improve the performance of my Julia program for excellent numbers?

Tags:

performance

julia

brian d foy

2 Answers

IainDunning

StefanKarpinski

Recent Activity

Donate For Us

How can I improve the performance of my Julia program for excellent numbers?

Tags:

performance

julia

brian d foy

2 Answers

IainDunning

StefanKarpinski

Related questions

Recent Activity

Donate For Us