Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number base conversion as a stream operation

Is there a way in constant working space to do arbitrary size and arbitrary base conversions. That is, to convert a sequence of n numbers in the range [1,m] to a sequence of ceiling(n*log(m)/log(p)) numbers in the range [1,p] using a 1-to-1 mapping that (preferably but not necessarily) preservers lexigraphical order and gives sequential results?

I'm particularly interested in solutions that are viable as a pipe function, e.i. are able to handle larger dataset than can be stored in RAM.

I have found a number of solutions that require "working space" proportional to the size of the input but none yet that can get away with constant "working space".


Does dropping the sequential constraint make any difference? That is: allow lexicographically sequential inputs to result in non lexicographically sequential outputs:

F(1,2,6,4,3,7,8) -> (5,6,3,2,1,3,5,2,4,3)
F(1,2,6,4,3,7,9) -> (5,6,3,2,1,3,5,2,4,5)

some thoughts:

might this work?

streamBasen -> convert(n, lcm(n,p)) -> convert(lcm(n,p), p) -> streamBasep

(where lcm is least common multiple)

like image 431
BCS Avatar asked May 19 '09 19:05

BCS


People also ask

What is number base conversions?

Number system conversions deal with the operations to change the base of the numbers. For example, to change a decimal number with base 10 to binary number with base 2. We can also perform the arithmetic operations like addition, subtraction, multiplication on the number system.

What is base conversion in Java?

A class named Demo contains a function named 'base_convert' is defined. This function parses the integer from source base to the destination base, converts it into a string and returns it as output. In the main function, the value for number, for source base and for different destination bases are defined.

Why do we need various number base conversions?

The number based conversions are essential in digital electronics..mostly in all digital system,we have the input in decimal format..but it takes as binary number for the computation by decimal to binary conversion..and we use the hexadecimal number to make coding for microprocessor but it converts that to binary for ...


2 Answers

I don't think it's possible in the general case. If m is a power of p (or vice-versa), or if they're both powers of a common base, you can do it, since each group of logm(p) is then independent. However, in the general case, suppose you're converting the number a1a2a3... an. The equivalent number in base p is

sum(ai* mi-1 for i in 1..n)

If we've processed the first i digits, then we have the ith partial sum. To compute the i+1'th partial sum, we need to add ai+1* mi. In the general case, this number is going have non-zero digits in most places, so we'll need to modify all of the digits we've processed so far. In other words, we'll have to process all of the input digits before we'll know what the final output digits will be.

In the special case where m are both powers of a common base, or equivalently if logm(p) is a rational number, then mi will only have a few non-zero digits in base p near the front, so we can safely output most of the digits we've computed so far.

like image 82
Adam Rosenfield Avatar answered Sep 29 '22 18:09

Adam Rosenfield


I think there is a way of doing radix conversion in a stream-oriented fashion in lexicographic order. However, what I've come up with isn't sufficient for actually doing it, and it has a couple of assumptions:

  1. The length of the positional numbers are already known.
  2. The numbers described are integers. I've not considered what happens with the maths and -ive indices.

We have a sequence of values a of length p, where each value is in the range [0,m-1]. We want a sequence of values b of length q in the range [0,n-1]. We can work out the kth digit of our output sequence b from a as follows:

bk = floor[ sum(ai * mi for i in 0 to p-1) / nk ] mod n

Lets rearrange that sum into two parts, splitting it at an arbitrary point z

bk = floor[ ( sum(ai * mi for i in z to p-1) + sum(ai * mi for i in 0 to z-1) ) / nk ] mod n

Suppose that we don't yet know the values of a between [0,z-1] and can't compute the second sum term. We're left with having to deal with ranges. But that still gives us information about bk.

The minimum value bk can be is:

bk >= floor[ sum(ai * mi for i in z to p-1) / nk ] mod n

and the maximum value bk can be is:

bk <= floor[ ( sum(ai * mi for i in z to p-1) + mz - 1 ) / nk ] mod n

We should be able to do a process like this:

  1. Initialise z to be p. We will count down from p as we receive each character of a.
  2. Initialise k to the index of the most significant value in b. If my brain is still working, ceil[ logn(mp) ].
  3. Read a value of a. Decrement z.
  4. Compute the min and max value for bk.
  5. If the min and max are the same, output bk, and decrement k. Goto 4. (It may be possible that we already have enough values for several consecutive values of bk)
  6. If z!=0 then we expect more values of a. Goto 3.
  7. Hopefully, at this point we're done.

I've not considered how to efficiently compute the range values as yet, but I'm reasonably confident that computing the sum from the incoming characters of a can be done much more reasonably than storing all of a. Without doing the maths though, I won't make any hard claims about it though!

like image 31
Wuggy Avatar answered Sep 29 '22 20:09

Wuggy