Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructing fractions Interview challenge

I recently came across the following interview question, I was wondering if a dynamic programming approach would work, or/and if there was some kind of mathematical insight that would make the solution easier... Its very similar to how ieee754 doubles are constructed.

Question: There is vector V of N double values. Where the value at the ith index of the vector is equal to 1/2^(i+1). eg: 1/2, 1/4, 1/8, 1/16 etc...

You're to write a function that takes one double 'r' as input, where 0 < r < 1, and output the indexes of V to stdout that when summed will give a value closest to the value 'r' than any other combination of indexes from the vector V.

Furthermore the number of indexes should be a minimum, and in the event there are two solutions, the solution closest to zero should be preferred.

void getIndexes(std::vector<double>& V, double r)
{
 .... 
}

int main()
{
   std::vector<double> V;
   // populate V...
   double r = 0.3;
   getIndexes(V,r);
   return 0;
}

Note: It seems like there are a few SO'ers that aren't in the mood of reading the question completely. So lets all note the following:

  1. The solution, aka the sum may be larger than r - hence any strategy incrementally subtracting fractions from r, until it hits zero or near zero is wrong

  2. There are examples of r, where there will be 2 solutions, that is |r-s0| == |r-s1| and s0 < s1 - in this case s0 should be selected, this makes the problem slightly more difficult, as the knapsack style solutions tend to greedy overestimates first.

  3. If you believe this problem is trivial, you most likely haven't understood it. Hence it would be a good idea to read the question again.

EDIT (Matthieu M.): 2 examples for V = {1/2, 1/4, 1/8, 1/16, 1/32}

  • r = 0.3, S = {1, 3}
  • r = 0.256652, S = {1}
like image 989
Xander Tulip Avatar asked May 07 '12 06:05

Xander Tulip


4 Answers

Algorithm

Consider a target number r and a set F of fractions {1/2, 1/4, ... 1/(2^N)}. Let the smallest fraction, 1/(2^N), be denoted P.

Then the optimal sum will be equal to:

S = P * round(r/P)

That is, the optimal sum S will be some integer multiple of the smallest fraction available, P. The maximum error, err = r - S, is ± 1/2 * 1/(2^N). No better solution is possible because this would require the use of a number smaller than 1/(2^N), which is the smallest number in the set F.

Since the fractions F are all power-of-two multiples of P = 1/(2^N), any integer multiple of P can be expressed as a sum of the fractions in F. To obtain the list of fractions that should be used, encode the integer round(r/P) in binary and read off 1 in the kth binary place as "include the kth fraction in the solution".

Example:

Take r = 0.3 and F as {1/2, 1/4, 1/8, 1/16, 1/32}.

  1. Multiply the entire problem by 32.

    Take r = 9.6, and F as {16, 8, 4, 2, 1}.

  2. Round r to the nearest integer.

    Take r = 10.

  3. Encode 10 as a binary integer (five places)

    10 = 0b 0 1 0 1 0    ( 8 + 2 )
            ^ ^ ^ ^ ^
            | | | | |
            | | | | 1
            | | | 2
            | | 4
            | 8
            16
    
  4. Associate each binary bit with a fraction.

       = 0b 0 1 0 1 0    ( 1/4 + 1/16 = 0.3125 )
            ^ ^ ^ ^ ^
            | | | | |
            | | | | 1/32
            | | | 1/16
            | | 1/8
            | 1/4
            1/2
    

Proof

Consider transforming the problem by multiplying all the numbers involved by 2**N so that all the fractions become integers.

The original problem:

Consider a target number r in the range 0 < r < 1, and a list of fractions {1/2, 1/4, .... 1/(2**N). Find the subset of the list of fractions that sums to S such that error = r - S is minimised.

Becomes the following equivalent problem (after multiplying by 2**N):

Consider a target number r in the range 0 < r < 2**N and a list of integers {2**(N-1), 2**(N-2), ... , 4, 2, 1}. Find the subset of the list of integers that sums to S such that error = r - S is minimised.

Choosing powers of two that sum to a given number (with as little error as possible) is simply binary encoding of an integer. This problem therefore reduces to binary encoding of a integer.

  • Existence of solution: Any positive floating point number r, 0 < r < 2**N, can be cast to an integer and represented in binary form.
  • Optimality: The maximum error in the integer version of the solution is the round-off error of ±0.5. (In the original problem, the maximum error is ±0.5 * 1/2**N.)
  • Uniqueness: for any positive (floating point) number there is a unique integer representation and therefore a unique binary representation. (Possible exception of 0.5 = see below.)

Implementation (Python)

This function converts the problem to the integer equivalent, rounds off r to an integer, then reads off the binary representation of r as an integer to get the required fractions.

def conv_frac (r,N):
    # Convert to equivalent integer problem.
    R = r * 2**N
    S = int(round(R))

    # Convert integer S to N-bit binary representation (i.e. a character string
    # of 1's and 0's.) Note use of [2:] to trim leading '0b' and zfill() to
    # zero-pad to required length.
    bin_S = bin(S)[2:].zfill(N)

    nums = list()
    for index, bit in enumerate(bin_S):
        k = index + 1
        if bit == '1':
            print "%i : 1/%i or %f" % (index, 2**k, 1.0/(2**k))
            nums.append(1.0/(2**k))
    S = sum(nums)
    e = r - S

    print """
    Original number        `r` : %f
    Number of fractions    `N` : %i (smallest fraction 1/%i)
    Sum of fractions       `S` : %f
    Error                  `e` : %f
    """ % (r,N,2**N,S,e)

Sample output:

>>> conv_frac(0.3141,10)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500
8 : 1/512 or 0.001953

    Original number        `r` : 0.314100
    Number of fractions    `N` : 10 (smallest fraction 1/1024)
    Sum of fractions       `S` : 0.314453
    Error                  `e` : -0.000353

>>> conv_frac(0.30,5)
1 : 1/4 or 0.250000
3 : 1/16 or 0.062500

    Original number        `r` : 0.300000
    Number of fractions    `N` : 5 (smallest fraction 1/32)
    Sum of fractions       `S` : 0.312500
    Error                  `e` : -0.012500

Addendum: the 0.5 problem

If r * 2**N ends in 0.5, then it could be rounded up or down. That is, there are two possible representations as a sum-of-fractions.

If, as in the original problem statement, you want the representation that uses fewest fractions (i.e. the least number of 1 bits in the binary representation), just try both rounding options and pick whichever one is more economical.

like image 176
Li-aung Yip Avatar answered Nov 08 '22 22:11

Li-aung Yip


Perhaps I am dumb...

The only trick I can see here is that the sum of (1/2)^(i+1) for i in [0..n) where n tends towards infinity gives 1. This simple fact proves that (1/2)^i is always superior to sum (1/2)^j for j in [i+1, n), whatever n is.

So, when looking for our indices, it does not seem we have much choice. Let's start with i = 0

  • either r is superior to 2^-(i+1) and thus we need it
  • or it is inferior and we need to choose whether 2^-(i+1) OR sum 2^-j for j in [i+2, N] is closest (deferring to the latter in case of equality)

The only step that could be costly is obtaining the sum, but it can be precomputed once and for all (and even precomputed lazily).

// The resulting vector contains at index i the sum of 2^-j for j in [i+1, N]
// and is padded with one 0 to get the same length as `v`
static std::vector<double> partialSums(std::vector<double> const& v) {
    std::vector<double> result;

    // When summing doubles, we need to start with the smaller ones
    // because of the precision of representations...

    double sum = 0;
    BOOST_REVERSE_FOREACH(double d, v) {
        sum += d;
        result.push_back(sum);
    }

    result.pop_back(); // there is a +1 offset in the indexes of the result

    std::reverse(result.begin(), result.end());

    result.push_back(0); // pad the vector to have the same length as `v`

    return result;   
}

// The resulting vector contains the indexes elected
static std::vector<size_t> getIndexesImpl(std::vector<double> const& v,
                                          std::vector<double> const& ps,
                                          double r)
{
  std::vector<size_t> indexes;

  for (size_t i = 0, max = v.size(); i != max; ++i) {
      if (r >= v[i]) {
          r -= v[i];
          indexes.push_back(i);
          continue;
      }

      // We favor the closest to 0 in case of equality
      // which is the sum of the tail as per the theorem above.
      if (std::fabs(r - v[i]) < std::fabs(r - ps[i])) {
          indexes.push_back(i);
          return indexes;
      }
  }

  return indexes;
}

std::vector<size_t> getIndexes(std::vector<double>& v, double r) {
    std::vector<double> const ps = partialSums(v);
    return getIndexesImpl(v, ps, r);
}

The code runs (with some debug output) at ideone. Note that for 0.3 it gives:

0.3:
   1: 0.25
   3: 0.0625
=> 0.3125

which is slightly different from the other answers.

like image 25
Matthieu M. Avatar answered Nov 09 '22 00:11

Matthieu M.


At the risk of downvotes, this problem seems to be rather straightforward. Just start with the largest and smallest numbers you can produce out of V, adjust each index in turn until you have the two possible closest answers. Then evaluate which one is the better answer.

Here is untested code (in a language that I don't write):

void getIndexes(std::vector<double>& V, double r)
{
  double v_lower = 0;
  double v_upper = 1.0 - 0.5**V.size();
  std::vector<int> index_lower;
  std::vector<int> index_upper;

  if (v_upper <= r)
  {
    // The answer is trivial.
    for (int i = 0; i < V.size(); i++)
      cout << i;
    return;
  }

  for (int i = 0; i < N; i++)
  {
    if (v_lower + V[i] <= r)
    {
      v_lower += V[i];
      index_lower.push_back(i);
    }

    if (r <= v_upper - V[i])
      v_upper -= V[i];
    else
      index_upper.push_back(i);
  }

  if (r - v_lower < v_upper - r)
    printIndexes(index_lower);
  else if (v_upper - r < r - v_lower)
    printIndexes(index_upper);
  else if (v_upper.size() < v_lower.size())
    printIndexes(index_upper);
  else
    printIndexes(index_lower);
}

void printIndexes(std::vector<int>& ind)
{
  for (int i = 0; i < ind.size(); i++)
  {
    cout << ind[i];
  }
}

Did I get the job! :D

(Please note, this is horrible code that relies on our knowing exactly what V has in it...)

like image 4
btilly Avatar answered Nov 08 '22 22:11

btilly


I will start by saying that I do believe that this problem is trivial...

(waits until all stones have been thrown)

Yes, I did read the OP's edit that says that I have to re-read the question if I think so. Therefore I might be missing something that I fail to see - in this case please excuse my ignorance and feel free to point out my mistakes.

I don't see this as a dynamic programming problem. At the risk of sounding naive, why not try keeping two estimations of r while searching for indices - namely an under-estimation and an over-estimation. After all, if r does not equal any sum that can be computed from elements of V, it will lie between some two sums of the kind. Our goal is to find these sums and to report which is closer to r.

I threw together some quick-and-dirty Python code that does the job. The answer it reports is correct for the two test cases that the OP provided. Note that if the return is structured such that at least one index always has to be returned - even if the best estimation is no indices at all.

def estimate(V, r):
  lb = 0               # under-estimation (lower-bound)
  lbList = []
  ub = 1 - 0.5**len(V) # over-estimation = sum of all elements of V
  ubList = range(len(V))

  # calculate closest under-estimation and over-estimation
  for i in range(len(V)):
    if r == lb + V[i]:
      return (lbList + [i], lb + V[i])
    elif r == ub:
      return (ubList, ub)
    elif r > lb + V[i]:
      lb += V[i]
      lbList += [i]
    elif lb + V[i] < ub:
      ub = lb + V[i]
      ubList = lbList + [i]
  return (ubList, ub) if ub - r < r - lb else (lbList, lb) if lb != 0 else ([len(V) - 1], V[len(V) - 1])

# populate V
N = 5 # number of elements
V = []
for i in range(1, N + 1):
  V += [0.5**i]

# test
r = 0.484375 # this value is equidistant from both under- and over-estimation
print "r:", r
estimate = estimate(V, r)
print "Indices:", estimate[0]
print "Estimate:", estimate[1]

Note: after finishing writing my answer I noticed that this answer follows the same logic. Alas!

like image 2
Artyom Avatar answered Nov 08 '22 22:11

Artyom