Discover periodic patterns in a large data-set

Question

I have a large sequence of tuples on disk in the form (t1, k1) (t2, k2) ... (tn, kn)

ti is a monotonically increasing timestamp and ki is a key (assume a fixed length string if needed). Neither ti nor ki are guaranteed to be unique. However, the number of unique tis and kis is huge (millions). n itself is very large (100 Million+) and the size of k (approx 500 bytes) makes it impossible to store everything in memory.

I would like to find out periodic occurrences of keys in this sequence.

For example, if I have the sequence (1, a) (2, b) (3, c) (4, b) (5, a) (6, b) (7, d) (8, b) (9, a) (10, b)

The algorithm should emit (a, 4) and (b, 2). That is a occurs with a period of 4 and b occurs with a period of 2.

If I build a hash of all keys and store the average of the difference between consecutive timestamps of each key and a std deviation of the same, I might be able to make a pass, and report only the ones that have an acceptable std deviation(ideally, 0). However, it requires one bucket per unique key, whereas in practice, I might have very few really periodic patterns. Any better ways?

Norman Ramsey · Accepted Answer

You could use discrete autocorrelation to find the periods, then search for the keys. The advantages of autocorrelation are that it's a little easier to understand what's going on in the discrete domain, and you don't have to worry about mapping the keys to anything—just use a characteristic function of two keys which is 1 when they are equal and 0 when they are unequal.

Rob Lachlan · Answer

This is more or less the reason for which Fourier transforms (Fast Fourier Transforms, etc.) were invented.

You are essentially transforming a sequence from the time (or some similar dimension) domain to a frequency domain. This is a very old problem, predating the application of computers, and there is an immense body of theory on the subject. Also see discrete fourier transform.

EDIT: You would have to transform your values k1, k2, ... somehow, but assuming that that's feasible, this approach should be, too.

Discover periodic patterns in a large data-set

Tags:

algorithm

Miner

2 Answers

Norman Ramsey

Rob Lachlan

Recent Activity

Donate For Us

Discover periodic patterns in a large data-set

Tags:

algorithm

Miner

2 Answers

Norman Ramsey

Rob Lachlan

Related questions

Recent Activity

Donate For Us