Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to do base36 arithmetic in Perl?

Tags:

math

perl

base-n

What's the best way to do base36 arithmetic in Perl?

To be more specific, I need to be able to do the following:

  • Operate on positive N-digit numbers in base 36 (e.g. digits are 0-9 A-Z)

    N is finite, say 9

  • Provide basic arithmetic, at the very least the following 3:

    • Addition (A+B)

    • Subtraction (A-B)

    • Whole division, e.g. floor(A/B).

    • Strictly speaking, I don't really need a base10 conversion ability - the numbers will 100% of time be in base36. So I'm quite OK if the solution does NOT implement conversion from base36 back to base10 and vice versa.

I don't much care whether the solution is brute-force "convert to base 10 and back" or converting to binary, or some more elegant approach "natively" performing baseN operations (as stated above, to/from base10 conversion is not a requirement). My only 3 considerations are:

  1. It fits the minimum specifications above

  2. It's "standard". Currently we're using and old homegrown module based on base10 conversion done by hand that is buggy and sucks.

    I'd much rather replace that with some commonly used CPAN solution instead of re-writing my own bicycle from scratch, but I'm perfectly capable of building it if no better standard possibility exists.

  3. It must be fast-ish (though not lightning fast). Something that takes 1 second to sum up 2 9-digit base36 numbers is worse than anything I can roll on my own :)

P.S. Just to provide some context in case people decide to solve my XY problem for me in addition to answering the technical question above :)

We have a fairly large tree (stored in DB as a bunch of edges), and we need to superimpose order on a subset of that tree. The tree dimentions are big both depth- and breadth- wise. The tree is VERY actively updated (inserts and deletes and branch moves).

This is currently done by having a second table with 3 columns: parent_vertex, child_vertex, local_order, where local_order is an 9-character string built of A-Z0-9 (e.g. base 36 number).

Additional considerations:

  • It is required that the local order is unique per child (and obviously unique per parent),

  • Any complete re-ordering of a parent is somewhat expensive, and thus the implementation is to try and assign - for a parent with X children - the orders which are somewhat evenly distributed between 0 and 36**10-1, so that almost no tree inserts result in a full re-ordering.

like image 934
DVK Avatar asked Apr 19 '10 21:04

DVK


3 Answers

What about Math::Base36?

like image 181
daotoad Avatar answered Nov 20 '22 17:11

daotoad


I am assuming that Perl core modules are OK?

How about using native (binary) integer math and convert from the base 36 result using POSIX::strtol()

There is HUGE variability in speed in the different methods to convert to/from base 36. Strtol is 80x faster than a Math::Base36:decode_base36 for example and the conversion subs that I have in the listing are 2 to 4X faster than Math::Base36. They also support any integer base up to 62. (easily extended by adding characters to the nums array.)

Here is a quick benchmark:

#!/usr/bin/perl
use POSIX;
use Math::BaseCnv;
use Math::Base36 ':all';
use Benchmark;

{
    my @nums = (0..9,'a'..'z','A'..'Z');
    $chr=join('',@nums);
    my %nums = map { $nums[$_] => $_ } 0..$#nums;

    sub to_base
    {
        my ($base, $n) = @_;
        return $nums[0] if $n == 0;
        return $nums[0] if $base > $#nums;
        my $str = ''; 
        while( $n > 0 )
        {
            $str = $nums[$n % $base] . $str;
            $n = int( $n / $base );
        }
        return $str;
    }

    sub fr_base
    {
        my ($base,$str) = @_;
        my $n = 0;

        return 0 if $str=~/[^$chr]/;

        foreach ($str =~ /[$chr]/g)
        {
            $n *= $base;
            $n += $nums{$_};
        }
        return $n;
    }
}

$base=36;   
$term=fr_base($base,"zzz");

for(0..$term) { push @numlist, to_base($base,$_); }

timethese(-10, {
        'to_base' => sub { for(0..$#numlist){ to_base($base,$_); }  },
        'encode_base36' => sub { for(0..$#numlist){ encode_base36($_); }  },
        'cnv->to 36' => sub { for(0..$#numlist){ cnv($_); }  },
        'decode_base36' => sub { foreach(@numlist){ decode_base36($_); }  }, 
        'fr_base' => sub { foreach(@numlist){ fr_base($base,$_); }  },
        'cnv->to decimal' => sub { foreach(@numlist){ cnv($_,$base,10); }  },
        'POSIX' => sub { foreach(@numlist){ POSIX::strtol($_,$base);}},
} );
like image 26
dawg Avatar answered Nov 20 '22 15:11

dawg


I would bet my money on converting to base10 and back.

If you dont have to do this very often and the numbers are not very large, that is the easiest (and thus least complex => least number of bugs) way to do it.

Of course, another way to do it is to also save the base10 number for computation purposes only, however, Im not sure if this is possible or has any advantage in your case

like image 29
Henri Avatar answered Nov 20 '22 15:11

Henri