I really don't know what the name of this problem is, but it's something like lossy compression, and I have a bad English, but I will try to describe it as much as I can.
Suppose I have list of unsorted unique numbers from unknown source, the length is usually between 255 to 512 with a range from 0 to 512.
I wonder if there is some kind of an algorithm that reads the data and return something like a seed number that I can use to generate a list somehow close to the original but with some degree of error.
For example
original list
{5, 13, 25, 33, 3, 10}
regenerated list
{4, 10, 30, 30, 5, 5} or {8, 20, 20, 35, 5, 9} //and so on
Does this problem have a name, and is there an algorithm that can do what I just described?
Is it the same as Monte Carlo method because from what I understand it isn't.
Is it possible to use some of the techniques used in lossy compression to get this kind of approximation ?
What I tried to do to solve this problem is to use a simple 16 bit RNG and brute-force all the possible values comparing them to the original list and pick the one with the minimum difference, but I think this way is rather dumb and inefficient.
This is indeed lossy compression.
You don't tell us the range of the values in the list. From the samples you give we can extrapolate that they count at least 6 bits each (0 to 63). In total, you have from 0 to 3072 bits to compress.
If these sequences have no special property and appear to be random, I doubt there is any way to achieve significant compression. Think that the probability of an arbitrary sequence to be matched from a 32 bits seed is 2^32.2^(-3072)=7.10^(-916), i.e. less than infinitesimal. If you allow 10% error on every value, the probability of a match is 2^32.0.1^512=4.10^(-503).
A trivial way to compress with 12.5% accuracy is to get rid of the three LSB of each value, leading to 50% savings (1536 bits), but I doubt this is what you are looking for.
Would be useful to measure the entropy of the sequences http://en.wikipedia.org/wiki/Entropy_(information_theory) and/or possible correlations between the values. This can be done by plotting all (V, Vi+1) pairs, or (Vi, Vi+1, Vi+2) triples and looking for patterns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With