Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert very huge number to base 10 string

Tags:

c#

math

An integer measures 4 bytes. In my example I have numbers measuring 1 MB. How can I convert them to a human readable decimal number fast?

The number is present in uint[] Array containing Size items.

like image 539
bytecode77 Avatar asked Mar 01 '12 18:03

bytecode77


People also ask

How do you convert a number to base 10?

Write each digit under its place value. Multiply each digit by its base raised to the corresponding place value. Add up the products. The answer will be the decimal number in base ten.

How high can you count in base 10 in a byte?

byte: 8 bits, can represent positive numbers from 0 to 255.

How do you convert a big number to a string in Python?

In Python an integer can be converted into a string using the built-in str() function. The str() function takes in any python data type and converts it into a string.


3 Answers

I've been thinking about your problem. I don't have a solution coded up, but here is an approach:

First off, let's assume without loss of generality that you have a collection of 2n bits. (If you have fewer than exactly 2n bits, pad out the bit array with leading zeros until you do. Obviously doing so never more than doubles the size of the array.) In your case you say you have a million uints, so that's 225 bits.

Let's also assume that every collection of 2k + 1 bits can be split evenly into two collections of bits, the left and right collections, each with 2k bits.

Thus every bit collection, or sub-collection, has a "size" which is an exact power of two. The smallest possible collection contains a single bit and cannot be further subdivided.

Second, let's assume that you similarly have an immutable representation of a number in decimal form, and that again, without loss of generality, there are 2d decimal digits in the string. If there are fewer than exactly 2d, again, pad with leading zeros. Again, each decimal collection of size greater than one can be split into two collections each half the size.

Now we sketch out a recursive algorithm:

DigitCollection Convert(BitCollection bits)
{
    if (bits.Size == 1)
        return bits[0] ? DigitCollection.SingleOne : DigitCollection.SingleZero;
    DigitCollection left = Convert(bits.Left);        
    DigitCollection right = Convert(bits.Right);
    DigitCollection power = MakePowerOfTwo(bits.Right.Size);
    return Add(Multiply(left, power), right);
}

We now need methods Add, Multiply and MakePowerOfTwo. As we'll see in a moment, we'll also need Subtract, and two Shift operators, for rapidly multiplying by powers of ten.

Addition and subtraction are easy. Clearly if the longer collection contains n digits then addition and subtraction methods can be implemented to take O(n) time.

FullShift and HalfShift operators make new digit collections out of old ones to facilitate rapid multiplication by powers of ten. If a digit collection of size 2d+1 consists of sub-collections (X1, X2) each of size 2d then the "half-shifted" collection contains 2d+2 items and consists of ( (2d leading zeros, X1), (X2, 2d trailing zeros)). The full-shift collection consists of ((X1, X2), (2d+1 trailing zeros)).

These are obviously very cheap to construct.

Multiplication is where we run into big problems. Suppose without loss of generality that we are multiplying together two DigitCollections each with exactly 2d digits. We can use Karatsuba's Algorithm:

DigitCollection Multiply(DigitCollection x, DigitCollection y)
{
    // Assume x and y are the same digit size.
    if (x and y are both single digits)
        return the trivial solution;
    // Assume that z2, z1 and z0 are all the same digit size as x and y.
    DigitCollection z2 = Multiply(x.Left, y.Left);
    DigitCollection z0 = Multiply(x.Right, y.Right);
    DigitCollection z1 = Subtract(
        Multiply(Add(x.Left, x.Right), Add(y.Left, y.Right)),
        Add(z2, z0));
    return Add(Add(FullShift(z2), HalfShift(z1)), z0);
}

What is the order of this algorithm? Suppose there are n = 2d digits. What is O(Multiply(n))? We recurse three times, each with a problem with half as many digits. The rest of the add, subtract and shift operations are each no more than O(n). So we have a recurrance:

T(n) = 3 T(n/2) + O(n)

Which has an easy solution via the Master Theorem: this algorithm is O(n1/lg 3) which is about O(n1.58).

What about MakePowerOfTwo? That's easy given what we already have. We use the identity:

22n + 1 = 2n x 2n + 2n x 2n

and write the algorithm:

DigitCollection MakePowerOfTwo(int p)
{
    if (p == 0) return DigitCollection.SingleOne;
    DigitCollection power = MakePowerOfTwo(p / 2);
    power = Multiply(power, power);
    if (p % 2 != 0)
        power = Add(power, power);
    return power;
}

It is dominated by the computation of the multiplication, and so is O(n1.58) as well.

And now we can see that the original call to Convert is also dominated by the multiplication.

So with this algorithm if you have 2d binary digits to convert, you can expect it will take about O(21.58 d) steps to do so. In your case you have 225 bits to convert, so that should take about 777 billion calculations.

The key fact here is that this algorithm is utterly dominated by the cost of the multiplication. If you can reduce the cost of the multiplication to less than O(n1.58) then you get huge wins. If I were you I would be studying improvements to decimal multiplication algorithms over Karatsuba.

like image 129
Eric Lippert Avatar answered Sep 18 '22 23:09

Eric Lippert


I don't know if this is any faster, but here is an example in delphi I wrote a long time ago to handle big ints as strings (VERY quick and dirty) - this was for 128bit uint but you could extend it indefinitely

Function HexToBinShort(hex:integer):string;
begin
  case hex of
    0:  result:='0000';  //convert each hex digit to binary string
    1:  result:='0001';  //could do this with high-nybble and low nybble
    2:  result:='0010';  //of each sequential byte in the array (mask and bit shift)
    3:  result:='0011';  //ie: binstring:=binstring + HexToBinShort(nybble[i])
    4:  result:='0100';  //but must count DOWN for i (start with MSB!)
    5:  result:='0101';
    6:  result:='0110';
    7:  result:='0111';
    8:  result:='1000';
    9:  result:='1001';
    10: result:='1010';
    11: result:='1011';
    12: result:='1100';
    13: result:='1101';
    14: result:='1110';
    15: result:='1111';
  end;
end;

Then take the concatenated binary string and add powers of two each time you see a '1'

Function BinToIntString(binstring:string):string;
var i, j : integer;
var calcHold, calc2 :string;
begin
  calc2:=binstring[Length(binstring)];   // first bit is easy 0 or 1
  for i := (Length(binstring) - 1) downto 1 do begin       
    if binstring[i] = '1' then begin   
       calcHold:=generateCard(Length(binstring)-i);
       calc2 := AddDecimalStrings(calcHold, calc2);
    end;
  end;
  result:=calc2;
end;

generateCard is used to create a decimal string representation of 2^i (for i>0)

Function generateCard(i:integer):string;
var j : integer;
var outVal : string;
begin
  outVal := '2';
  if i > 1 then begin
    for j := 2 to i do begin
      outVal:= MulByTwo(outVal);
    end;
  end;
  result := outVal;
end;

and MulByTwo multiplies a decimal string by two

Function MulByTwo(val:string):string;
var i : integer;
var carry, hold : integer;
var outHold : string;
var outString :string;
var outString2 : string;
begin
  outString:= StringOfChar('0', Length(val) + 1);
  outString2:= StringOfChar('0', Length(val));
  carry :=0;
  for i := Length(val) downto 1 do begin
    hold := StrToInt(val[i]) * 2 + carry;
    if hold >= 10 then begin
      carry := 1;
      hold := hold - 10;
    end else begin
      carry := 0;
    end;
    outHold := IntToStr(hold);
    outString[i+1] := outHold[1];
  end;
  if carry = 1 then begin
    outString[1] := '1';
    result := outString;
  end else begin
    for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
    result := outString2;
  end;
end; 

And finally - AddDecimalStrings...well, it adds two decimal strings :

Function AddDecimalStrings(val1, val2:string):string;
var i,j :integer;
var carry, hold, largest: integer;
var outString, outString2, bigVal, smVal, outHold:string;
begin
  if Length(val1) > Length(val2) then begin
    largest:= Length(val1);
    bigVal := val1;
    smVal := StringOfChar('0', largest);
    j:=1;
    for i := (largest - length(val2) +1) to largest do begin
      smVal[i] := val2[j];
      j:=j+1;
    end;
  end else begin
    if length(val2) > Length(val1) then begin
      largest:=Length(val2);
      bigVal:=val2;
      smVal := StringOfChar('0', largest);
      j:=1;
      for i := (largest - length(val1) +1) to largest do begin
        smVal[i] := val1[j];
        j:=j+1;
      end;
    end else begin
      largest:=length(val1);
      bigVal:=val1;
      smVal:=val2;
    end;
  end;
  carry:=0;
  outString:=StringOfChar('0', largest +1);
  outString2:=StringOfChar('0', largest);

  for i := largest downto 1 do begin
    hold := StrToInt(bigVal[i]) + StrToInt(smVal[i]) + carry;
    if hold >=10 then begin
      carry:=1;
      hold := hold - 10;
    end else begin
      carry:=0;
    end;
    outHold:= IntToStr(hold);
    outString[i+1]:=outHold[1];
  end;

  if carry = 1 then begin
    outString[1] := '1';
    result := outString;
  end else begin
    for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
    result := outString2;
  end;  
end;

These functions allow you to perform basic arithmetic on almost arbitrarily large integers as strings. You hit another wall when the number of digits is too big to index an array with, of course.

Here's a divide by two, btw (useful for going the other way...). I don't handle odd numbers here.

Function DivByTwo(val:string):string;
var i : integer;
var hold : integer;
var outHold : string;
var outString, outString2 :string;
begin
  outString:=StringOfChar('0',Length(val));

  for i := Length(val) downto 1 do begin
    if StrToInt(val[i]) mod 2 = 0 then begin
      hold:= Math.Floor(StrToInt(val[i]) / 2);
      outHold:= IntToStr(hold);
      outString[i]:=outHold[1];
    end else begin
      hold:= Math.Floor((StrToInt(val[i]) - 1) / 2);
      outHold:=IntToStr(hold);
      outString[i]:= outHold[1];
      if i <> Length(val) then begin
        hold:= StrToInt(outString[i+1]) + 5;
        outHold:= IntToStr(hold);
        outString[i+1] := outHold[1];
      end;
    end;
  end;
  outString2:=StringOfChar('0',Length(val)-1);
  if (outString[1] = '0') and (length(outString) > 1) then begin
    for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
    result:=outString2;
  end else begin
    result:=outString;
  end;
end;

EDIT: I just tried this with a 9million bit long binary string and it is ludicrously slow! Not surprising, really. This is completely unoptimized code which has a lot of low hanging fruit to pick at for speeding things up. Still, I can't help but feel that this is the kind (or scale) of problem that you would probably want to write, at least partially, in fully optimized assembly. The individual operations are small but they must be done many times - that spells begging for assembly. Multithreading could certainly be leveraged here too.

like image 45
J... Avatar answered Sep 18 '22 23:09

J...


You might be able to save some time by doing more than one digit at a time. if you do it, say, 100,000 at a time, it'll likely go at least a little faster than 10 at a time.

Mind you, it's still likely to be pretty painfully slow, but it'll save you some time.

It's conceivable that you could make it recursive, and speed it up that much more - get the rough square root of the number, rounded down to the nearest exponent of 10. div and mod by that number, and send the results back to the same function. Mind you, I'm not sure how you'd go about efficiently running a div or mod of that size, but if you can figure it out (and don't run out of memory) it's still bound to be more time-efficient than dividing it out a digit at a time.

Alternate version: give up on decimals - since the number's going to be way too large to make sense to any actual human readers anyway - and render the thing in hex. Still technically human-readable, but you can render it a byte at a time and save yourself a whole lot of heartache.

like image 44
Ben Barden Avatar answered Sep 17 '22 23:09

Ben Barden