Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rescale a vector of integers

Assume that I have a vector, V, of positive integers. If the sum of the integers are larger than a positive integer N, I want to rescale the integers in V so that the sum is <= N. The elements in V must remain above zero. The length of V is guaranteed to be <= N.

Is there an algorithm to perform this rescaling in linear time?

This is not homework, BTW :). I need to rescale a map from symbols to symbol frequencies to use range encoding.

Some quick thinking and googling has not given a solution to the problem.

EDIT:

Ok, the question was somewhat unclear. "Rescale" means "normalize". That is, transform the integers in V, for example by multiplying them by a constant, to smaller positive integers so the criterion of sum(V) <= N is fulfilled. The better the ratios between the integers are preserved, the better the compression will be.

The problem is open-ended in that way, the method does not need to find the optimal (in, say, a least squares fit sense) way to preserve the ratios, but a "good" one. Setting the entire vector to 1, as suggested, is not acceptable (unless forced). "Good" enough would for example be finding the smallest divisor (defined below) that fulfills the sum criterion.

The following naive algorithm does not work.

  1. Find the current sum(V), Sv
  2. divisor := int(ceil(Sv/N))
  3. Divide each integer in V by divisor, rounding down, but not to less than 1.

This fails on v = [1,1,1,10] with N = 5.

divisor = ceil(13 / 5) = 3.
V := [1,1,1, max(1, floor(10/3)) = 3]
Sv is now 6 > 5.

In this case, the correct normalization is [1,1,1,2]

One algorithm that would work is to do a binary search for divisor (defined above) until the smallest divisor in [1,N] fulfilling the sum criterion is found. Starting with the ceil(Sv/N) guess. This is however, not linear in number of operations, but proportional to len(V)*log(len(V)).

I am starting to think that it is impossible to do well, in linear time, in the general case. I might resort to some sort of heuristic.

like image 543
Gurgeh Avatar asked May 16 '11 16:05

Gurgeh


2 Answers

Just divide all the integers by their Greatest Common Divisor. You can find the GCD efficiently with multiple applications of Euclid's Algorithm.

d = 0
for x in xs:
    d = gcd(d, x)

xs = [x/d for x in xs]

The positive point is that you always have a small as possible representation this way, without throwing away any precision and without needing to choose a specific N. The downside is that if your frequencies are large coprime numbers you will have no choice but to sacrifice precision (and you didn't specify what should be done in this case).

like image 85
hugomg Avatar answered Sep 26 '22 00:09

hugomg


How about this:

  1. Find the current sum(V), Sv
  2. divisor := int(ceil(Sv/(N - |V| + 1))
  3. Divide each integer in V by divisor, rounding up

On v = [1,1,1,10] with N = 5:

divisor = ceil(13 / 2) = 7. V := [1,1,1, ceil(10/7)) = 2]

like image 35
starblue Avatar answered Sep 25 '22 00:09

starblue