Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large integer radix/base conversion from 10^x to 2^x

Preface

I'm learning about computer math by writing and refining my own BigInt library. So far my first incarnation stores every digit of a base 10 number in successive elements of a vector. It can multiply and add with arbitrary precision. I want to speed it up by using all of the space available to me in a standard C++ data type by converting to base 2^x.

Information

I am reading 1000 or more digits from stdin in base 10 and I wish to convert them to base 2^x so I may store them easily in an array or vector of one of the standard C++ data types, likely unsigned int. I have only one idea for how to do base conversion, the repeated division with remainder method. Here is some C++ code describing that method:

vector<int> digits;
while(num!=0) {
   int mod = num%base;
   num = num/base;
   digits.push_back(mod);
}

Conundrums

Some things I'm lost on are whether or not division with remainder is the proper way to do radix conversion on large integers. I've tried looking at how the GMP library does it. gmp/mpn/generic/set_str.c is the relevant c source file where "the magic" happens, but I am not certain what is going on in there. Matt McCutchen's BigInt appears to use the repeated division with remainder method. If I do use this method, I essentially need write two versions of my BigInt class, one for working in Base10 and the other for Base2^x.

Conclusion

  • Offer advice on the proper steps for converting a huge number from a string to an array of 32 bit words.
  • Help me learn how GMP converts a string to an array of 32bit words without wading through many layers of abstraction.

Examples Using 4 bit word size

Number we want to store (obviously on the small size): 123456789

unsigned chars have a range of 0-255, if we want to split up our number and store it in the vector we can do it one of three ways:

  • As base 10, our vector looks like: [1,2,3,4,5,6,7,8,9]
    • This is what my vector looked like in my first implementation.
  • As base 100, our vector looks like: [1,23,45,67,89]
    • Easy to convert from base 10 to base 100, has ciel(digits in base10/2) elements.
  • As base 256, our vector looks like: [7,91,205,21]

Obviously the third solution is the most optimal for internal representation, and is what I am trying to get to.

like image 737
frontendloader Avatar asked Jun 06 '11 22:06

frontendloader


People also ask

What do you mean by radix and radix conversion?

In a positional numeral system, the radix or base is the number of unique digits, including the digit zero, used to represent numbers. For example, for the decimal/denary system (the most common system in use today) the radix (base number) is ten, because it uses the ten digits from 0 through 9.

How do I find the radix of a number?

Number System and Radix Normally, to find the radix of a number all you have to do is look at its subscript. The number written is its radix, or base. For example (a) b means that the number "a" is written in terms of "b." This number has "b" number of unique digits in its system.

How do you turn a base into a decimal?

Algorithm to Convert From Any Base to Base 10 DecimalLet n be the number of digits in the number. For example, 104 has 3 digits, so n=3. Let b be the base of the number. For example, 104 is decimal so b = 10.


1 Answers

Once you have multiply and add functions working for your bigint library, converting a string to bigint is simplicity itself. Start with a conversion result of zero. For each digit you process (left to right), multiply the previous result by 10 and add the value of the new digit (using your bigint multiply and add functions).

like image 51
Mark Ransom Avatar answered Sep 22 '22 14:09

Mark Ransom