The Online Encyclopedia of Integer Sequences supports search for sequences containing your query as a subsequence, eg. searching for subseq:212,364,420,428
will return the 8*n+4
sequence. (http://oeis.org/search?q=subseq:212,364,420,428)
This amazing feature was apparently implemented by Russ Cox as by http://oeis.org/wiki/User:Russ_Cox/OEIS_Server_Features, but it is not specified with what algorithm this is implemented.
I'm wondering how it is done. Clearly going through nearly a million of sequences for every search is impractical for a search engine. Just keeping an index (which is how the same Russ Cox did Google Code Regex Search) of the first number and brute forcing the rest also doesn't work, as numbers like 0
is in nearly all sequences. In fact some queries like 0 1
match a high percentage of the total database, so the algorithm needs a running time sensitive to the desired output size.
Does anyone happen to know how this feature is implemented?
My guess is part of the data is stored in an inverted index. That is each number is linked to a set of series, and when multiple sequences are entered, the set of common sequences is shown. This is extremely fast and used by almost every search engine.
Storing as a suffix trees or any linked data structure is useless for this application.
At least for some set of sequences (eg ax+b), I think it is better to save them parametrically rather than storing the actual sequence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With