An integer measures 4 bytes. In my example I have numbers measuring 1 MB. How can I convert them to a human readable decimal number fast?
The number is present in uint[]
Array containing Size
items.
Write each digit under its place value. Multiply each digit by its base raised to the corresponding place value. Add up the products. The answer will be the decimal number in base ten.
byte: 8 bits, can represent positive numbers from 0 to 255.
In Python an integer can be converted into a string using the built-in str() function. The str() function takes in any python data type and converts it into a string.
I've been thinking about your problem. I don't have a solution coded up, but here is an approach:
First off, let's assume without loss of generality that you have a collection of 2n bits. (If you have fewer than exactly 2n bits, pad out the bit array with leading zeros until you do. Obviously doing so never more than doubles the size of the array.) In your case you say you have a million uints, so that's 225 bits.
Let's also assume that every collection of 2k + 1 bits can be split evenly into two collections of bits, the left and right collections, each with 2k bits.
Thus every bit collection, or sub-collection, has a "size" which is an exact power of two. The smallest possible collection contains a single bit and cannot be further subdivided.
Second, let's assume that you similarly have an immutable representation of a number in decimal form, and that again, without loss of generality, there are 2d decimal digits in the string. If there are fewer than exactly 2d, again, pad with leading zeros. Again, each decimal collection of size greater than one can be split into two collections each half the size.
Now we sketch out a recursive algorithm:
DigitCollection Convert(BitCollection bits)
{
if (bits.Size == 1)
return bits[0] ? DigitCollection.SingleOne : DigitCollection.SingleZero;
DigitCollection left = Convert(bits.Left);
DigitCollection right = Convert(bits.Right);
DigitCollection power = MakePowerOfTwo(bits.Right.Size);
return Add(Multiply(left, power), right);
}
We now need methods Add, Multiply and MakePowerOfTwo. As we'll see in a moment, we'll also need Subtract, and two Shift operators, for rapidly multiplying by powers of ten.
Addition and subtraction are easy. Clearly if the longer collection contains n digits then addition and subtraction methods can be implemented to take O(n) time.
FullShift and HalfShift operators make new digit collections out of old ones to facilitate rapid multiplication by powers of ten. If a digit collection of size 2d+1 consists of sub-collections (X1, X2) each of size 2d then the "half-shifted" collection contains 2d+2 items and consists of ( (2d leading zeros, X1), (X2, 2d trailing zeros)). The full-shift collection consists of ((X1, X2), (2d+1 trailing zeros)).
These are obviously very cheap to construct.
Multiplication is where we run into big problems. Suppose without loss of generality that we are multiplying together two DigitCollections each with exactly 2d digits. We can use Karatsuba's Algorithm:
DigitCollection Multiply(DigitCollection x, DigitCollection y)
{
// Assume x and y are the same digit size.
if (x and y are both single digits)
return the trivial solution;
// Assume that z2, z1 and z0 are all the same digit size as x and y.
DigitCollection z2 = Multiply(x.Left, y.Left);
DigitCollection z0 = Multiply(x.Right, y.Right);
DigitCollection z1 = Subtract(
Multiply(Add(x.Left, x.Right), Add(y.Left, y.Right)),
Add(z2, z0));
return Add(Add(FullShift(z2), HalfShift(z1)), z0);
}
What is the order of this algorithm? Suppose there are n = 2d digits. What is O(Multiply(n))? We recurse three times, each with a problem with half as many digits. The rest of the add, subtract and shift operations are each no more than O(n). So we have a recurrance:
T(n) = 3 T(n/2) + O(n)
Which has an easy solution via the Master Theorem: this algorithm is O(n1/lg 3) which is about O(n1.58).
What about MakePowerOfTwo? That's easy given what we already have. We use the identity:
22n + 1 = 2n x 2n + 2n x 2n
and write the algorithm:
DigitCollection MakePowerOfTwo(int p)
{
if (p == 0) return DigitCollection.SingleOne;
DigitCollection power = MakePowerOfTwo(p / 2);
power = Multiply(power, power);
if (p % 2 != 0)
power = Add(power, power);
return power;
}
It is dominated by the computation of the multiplication, and so is O(n1.58) as well.
And now we can see that the original call to Convert is also dominated by the multiplication.
So with this algorithm if you have 2d binary digits to convert, you can expect it will take about O(21.58 d) steps to do so. In your case you have 225 bits to convert, so that should take about 777 billion calculations.
The key fact here is that this algorithm is utterly dominated by the cost of the multiplication. If you can reduce the cost of the multiplication to less than O(n1.58) then you get huge wins. If I were you I would be studying improvements to decimal multiplication algorithms over Karatsuba.
I don't know if this is any faster, but here is an example in delphi I wrote a long time ago to handle big ints as strings (VERY quick and dirty) - this was for 128bit uint but you could extend it indefinitely
Function HexToBinShort(hex:integer):string;
begin
case hex of
0: result:='0000'; //convert each hex digit to binary string
1: result:='0001'; //could do this with high-nybble and low nybble
2: result:='0010'; //of each sequential byte in the array (mask and bit shift)
3: result:='0011'; //ie: binstring:=binstring + HexToBinShort(nybble[i])
4: result:='0100'; //but must count DOWN for i (start with MSB!)
5: result:='0101';
6: result:='0110';
7: result:='0111';
8: result:='1000';
9: result:='1001';
10: result:='1010';
11: result:='1011';
12: result:='1100';
13: result:='1101';
14: result:='1110';
15: result:='1111';
end;
end;
Then take the concatenated binary string and add powers of two each time you see a '1'
Function BinToIntString(binstring:string):string;
var i, j : integer;
var calcHold, calc2 :string;
begin
calc2:=binstring[Length(binstring)]; // first bit is easy 0 or 1
for i := (Length(binstring) - 1) downto 1 do begin
if binstring[i] = '1' then begin
calcHold:=generateCard(Length(binstring)-i);
calc2 := AddDecimalStrings(calcHold, calc2);
end;
end;
result:=calc2;
end;
generateCard is used to create a decimal string representation of 2^i (for i>0)
Function generateCard(i:integer):string;
var j : integer;
var outVal : string;
begin
outVal := '2';
if i > 1 then begin
for j := 2 to i do begin
outVal:= MulByTwo(outVal);
end;
end;
result := outVal;
end;
and MulByTwo multiplies a decimal string by two
Function MulByTwo(val:string):string;
var i : integer;
var carry, hold : integer;
var outHold : string;
var outString :string;
var outString2 : string;
begin
outString:= StringOfChar('0', Length(val) + 1);
outString2:= StringOfChar('0', Length(val));
carry :=0;
for i := Length(val) downto 1 do begin
hold := StrToInt(val[i]) * 2 + carry;
if hold >= 10 then begin
carry := 1;
hold := hold - 10;
end else begin
carry := 0;
end;
outHold := IntToStr(hold);
outString[i+1] := outHold[1];
end;
if carry = 1 then begin
outString[1] := '1';
result := outString;
end else begin
for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
result := outString2;
end;
end;
And finally - AddDecimalStrings...well, it adds two decimal strings :
Function AddDecimalStrings(val1, val2:string):string;
var i,j :integer;
var carry, hold, largest: integer;
var outString, outString2, bigVal, smVal, outHold:string;
begin
if Length(val1) > Length(val2) then begin
largest:= Length(val1);
bigVal := val1;
smVal := StringOfChar('0', largest);
j:=1;
for i := (largest - length(val2) +1) to largest do begin
smVal[i] := val2[j];
j:=j+1;
end;
end else begin
if length(val2) > Length(val1) then begin
largest:=Length(val2);
bigVal:=val2;
smVal := StringOfChar('0', largest);
j:=1;
for i := (largest - length(val1) +1) to largest do begin
smVal[i] := val1[j];
j:=j+1;
end;
end else begin
largest:=length(val1);
bigVal:=val1;
smVal:=val2;
end;
end;
carry:=0;
outString:=StringOfChar('0', largest +1);
outString2:=StringOfChar('0', largest);
for i := largest downto 1 do begin
hold := StrToInt(bigVal[i]) + StrToInt(smVal[i]) + carry;
if hold >=10 then begin
carry:=1;
hold := hold - 10;
end else begin
carry:=0;
end;
outHold:= IntToStr(hold);
outString[i+1]:=outHold[1];
end;
if carry = 1 then begin
outString[1] := '1';
result := outString;
end else begin
for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
result := outString2;
end;
end;
These functions allow you to perform basic arithmetic on almost arbitrarily large integers as strings. You hit another wall when the number of digits is too big to index an array with, of course.
Here's a divide by two, btw (useful for going the other way...). I don't handle odd numbers here.
Function DivByTwo(val:string):string;
var i : integer;
var hold : integer;
var outHold : string;
var outString, outString2 :string;
begin
outString:=StringOfChar('0',Length(val));
for i := Length(val) downto 1 do begin
if StrToInt(val[i]) mod 2 = 0 then begin
hold:= Math.Floor(StrToInt(val[i]) / 2);
outHold:= IntToStr(hold);
outString[i]:=outHold[1];
end else begin
hold:= Math.Floor((StrToInt(val[i]) - 1) / 2);
outHold:=IntToStr(hold);
outString[i]:= outHold[1];
if i <> Length(val) then begin
hold:= StrToInt(outString[i+1]) + 5;
outHold:= IntToStr(hold);
outString[i+1] := outHold[1];
end;
end;
end;
outString2:=StringOfChar('0',Length(val)-1);
if (outString[1] = '0') and (length(outString) > 1) then begin
for i := 1 to length(outString2) do outString2[i]:=outString[i+1];
result:=outString2;
end else begin
result:=outString;
end;
end;
EDIT: I just tried this with a 9million bit long binary string and it is ludicrously slow! Not surprising, really. This is completely unoptimized code which has a lot of low hanging fruit to pick at for speeding things up. Still, I can't help but feel that this is the kind (or scale) of problem that you would probably want to write, at least partially, in fully optimized assembly. The individual operations are small but they must be done many times - that spells begging for assembly. Multithreading could certainly be leveraged here too.
You might be able to save some time by doing more than one digit at a time. if you do it, say, 100,000 at a time, it'll likely go at least a little faster than 10 at a time.
Mind you, it's still likely to be pretty painfully slow, but it'll save you some time.
It's conceivable that you could make it recursive, and speed it up that much more - get the rough square root of the number, rounded down to the nearest exponent of 10. div and mod by that number, and send the results back to the same function. Mind you, I'm not sure how you'd go about efficiently running a div or mod of that size, but if you can figure it out (and don't run out of memory) it's still bound to be more time-efficient than dividing it out a digit at a time.
Alternate version: give up on decimals - since the number's going to be way too large to make sense to any actual human readers anyway - and render the thing in hex. Still technically human-readable, but you can render it a byte at a time and save yourself a whole lot of heartache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With