Number base conversion as a stream operation

Tags:

Is there a way in constant working space to do arbitrary size and arbitrary base conversions. That is, to convert a sequence of n numbers in the range [1,m] to a sequence of ceiling(n*log(m)/log(p)) numbers in the range [1,p] using a 1-to-1 mapping that (preferably but not necessarily) preservers lexigraphical order and gives sequential results?

I'm particularly interested in solutions that are viable as a pipe function, e.i. are able to handle larger dataset than can be stored in RAM.

I have found a number of solutions that require "working space" proportional to the size of the input but none yet that can get away with constant "working space".

Does dropping the sequential constraint make any difference? That is: allow lexicographically sequential inputs to result in non lexicographically sequential outputs:

F(1,2,6,4,3,7,8) -> (5,6,3,2,1,3,5,2,4,3)
F(1,2,6,4,3,7,9) -> (5,6,3,2,1,3,5,2,4,5)

some thoughts:

might this work?

streamBase_n -> convert(n, lcm(n,p)) -> convert(lcm(n,p), p) -> streamBase_p

(where lcm is least common multiple)

431

asked May 19 '09 19:05

BCS

2 Answers

I don't think it's possible in the general case. If m is a power of p (or vice-versa), or if they're both powers of a common base, you can do it, since each group of log_m(p) is then independent. However, in the general case, suppose you're converting the number a₁a₂a₃... a_n. The equivalent number in base p is

sum(a_i* m^i-1 for i in 1..n)

If we've processed the first i digits, then we have the ith partial sum. To compute the i+1'th partial sum, we need to add a_i+1* mⁱ. In the general case, this number is going have non-zero digits in most places, so we'll need to modify all of the digits we've processed so far. In other words, we'll have to process all of the input digits before we'll know what the final output digits will be.

In the special case where m are both powers of a common base, or equivalently if log_m(p) is a rational number, then mⁱ will only have a few non-zero digits in base p near the front, so we can safely output most of the digits we've computed so far.

answered Sep 29 '22 18:09

Adam Rosenfield

I think there is a way of doing radix conversion in a stream-oriented fashion in lexicographic order. However, what I've come up with isn't sufficient for actually doing it, and it has a couple of assumptions:

The length of the positional numbers are already known.
The numbers described are integers. I've not considered what happens with the maths and -ive indices.

We have a sequence of values a of length p, where each value is in the range [0,m-1]. We want a sequence of values b of length q in the range [0,n-1]. We can work out the kth digit of our output sequence b from a as follows:

b_k = floor[ sum(a_i * mⁱ for i in 0 to p-1) / n^k ] mod n

Lets rearrange that sum into two parts, splitting it at an arbitrary point z

b_k = floor[ ( sum(a_i * mⁱ for i in z to p-1) + sum(a_i * mⁱ for i in 0 to z-1) ) / n^k ] mod n

Suppose that we don't yet know the values of a between [0,z-1] and can't compute the second sum term. We're left with having to deal with ranges. But that still gives us information about b_k.

The minimum value b_k can be is:

b_k >= floor[ sum(a_i * mⁱ for i in z to p-1) / n^k ] mod n

and the maximum value b_k can be is:

b_k <= floor[ ( sum(a_i * mⁱ for i in z to p-1) + m^z - 1 ) / n^k ] mod n

We should be able to do a process like this:

Initialise z to be p. We will count down from p as we receive each character of a.
Initialise k to the index of the most significant value in b. If my brain is still working, ceil[ log_n(m^p) ].
Read a value of a. Decrement z.
Compute the min and max value for b_k.
If the min and max are the same, output b_k, and decrement k. Goto 4. (It may be possible that we already have enough values for several consecutive values of b_k)
If z!=0 then we expect more values of a. Goto 3.
Hopefully, at this point we're done.

I've not considered how to efficiently compute the range values as yet, but I'm reasonably confident that computing the sum from the incoming characters of a can be done much more reasonably than storing all of a. Without doing the maths though, I won't make any hard claims about it though!

answered Sep 29 '22 20:09

Wuggy

Related questions
                            
                                How to solve Big-O Notation for prime number function?
                            
                                Iterative or Lazy Reservoir Sampling
                            
                                Java algorithm for find intersection between intervals
                            
                                Flip cards to get maximum sum
                            
                                Python Shift Scheduling Optimization
                            
                                Comparing two recorded voices
                            
                                Algorithm help: how to divide array into N segments with least possible largest segment (balanced segmenting)
                            
                                OutOfMemoryError: Java heap space when trying to read large file
                            
                                Fixing floor algorithm
                            
                                Binary Image "Lines-of-Sight" Edge Detection
                            
                                Dynamic Programming - Counting paths in a subway system
                            
                                Dynamic programming get maximum diamond
                            
                                Proper way to initialize a std::array from a C array
                            
                                Does there exist a Top Down Dynamic Programming solution for Longest Increasing Subsequence?
                            
                                Connect an even number of nodes without intersection
                            
                                OpenCV detect tennis court lines behind net
                            
                                Find the intersection of two curves given by (x, y) data with high precision in Python
                            
                                Is there a true single-pair shortest path algorithm?
                            
                                Iterating shuffled [0..n) without arrays
                            
                                How can I implement this more efficiently

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Number base conversion as a stream operation

Tags:

algorithm

complexity-theory

math

BCS

People also ask

2 Answers

Adam Rosenfield

Wuggy

Recent Activity

Donate For Us