Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Composing an average stream piecewise

Tags:

algorithm

I have a list of n floating point streams each having a different size.
The streams can be be composed together using the following rules:
You can put a stream starting at any point in time (its zero before it started). You can use the same stream few times (it can overlap itself and even be in the same position few times) and you are allowed to not use a certain stream at all.
e.g.:
input streams:

1 2 3 4
2 4 5 6 7
1 5 6

Can be composed like:

  1 2 3 4
1 5 6
        1 5 6

After the placements an output stream is composed by the rule that each output float equals to the square root of the sum of the square of each term.
e.g.:
If the streams at a position are:

1
2
3

The output is:

sqrt(1*1 + 2*2 + 3*3) = sqrt(14) = 3.74...

So for the the example composition:

  1 2 3 4
1 5 6
        1 5 6

The output is:

1 5.09 6.32 3 4.12 5 6

What I have is the output stream and the input streams. I need to compute the composition that lead to that output. an exact composition doesn't have to exists - I need a composition as close as possible to the output (smallest accumulated difference).
e.g.:
Input:
Stream to mimic:

1 5.09 6.32 3 4.12 5 6

and a list:

1 2 3 4
2 4 5 6 7
1 5 6

Expected output:

Stream 0 starting at 1,
Stream 2 starting at 0,
Stream 2 starting at 4.

This seems like an NP problem, is there any fast way to solve this? it can be somewhat brute force (but not totally, its not theoretic problem) and it can give not the best answer as long as its close enough.

The algorithm will be usually used with stream to mimic with very long length (can be few megabytes) while it will have around 20 streams to be composed from, while each stream will be around kilobyte long.

like image 908
Dani Avatar asked Sep 26 '11 18:09

Dani


1 Answers

I think you can speed up a greedy search a bit over the obvious. First of all, square each element in all of the streams involved. Then you are looking for a sum of squared streams that looks a lot like the squared target stream. Suppose that "it looks like" is the euclidean distance between the squared streams, considered as vectors.

Then we have (a-b)^2 = a^2 + b^2 - 2a.b. So if we can find the dot product of two vectors quickly, and we know their absolute size, we can find the distance quickly. But using the FFT and the http://en.wikipedia.org/wiki/Convolution_theorem, we can work out a.b_i where a is the target stream and b_i is stream b at some offset of i, by using the FFT to convolve a reversed version of b - for the cost of doing an FFT on a, an FFT on reversed b, and an FFT on the result, we get a.b_i for every offset i.

If we do a greedy search, the first step will be to find the b_i that makes (a-b_i)^2 smallest and subtract it from a. Then we are looking for a stream c_j that makes (a-b_i-c_j)^2 as small as possible. But this is a^2 + b_i^2 + c_j^2 - 2a.b_i - 2a.c_j + 2b_i.c_j and we have already calculated everything except b_i.c_j in the step above. If b and c are shorter streams it will be cheap to calculate b_i.c_j, and we can use the FFT as before.

So we have a not too horrible way to do a greedy search - at each stage subtract off the stream from the adjusted target stream so far that makes the residual smallest (considered as vectors in euclidean space), and carry on from there. At some stage we will find that none of the streams we have available make the residual any smaller. We can stop there, because our calculation above shows us that using two streams at once won't help either then - this follows because b_i.c_j >= 0, since each element of b_i is >= 0, because it is a square.

If you do a greedy search and are not satisfied, but have more cpu to burn, try Limited Discrepancy Search.

like image 66
mcdowella Avatar answered Oct 02 '22 05:10

mcdowella