Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does OEIS do subsequence search?

The Online Encyclopedia of Integer Sequences supports search for sequences containing your query as a subsequence, eg. searching for subseq:212,364,420,428 will return the 8*n+4 sequence. (http://oeis.org/search?q=subseq:212,364,420,428)

This amazing feature was apparently implemented by Russ Cox as by http://oeis.org/wiki/User:Russ_Cox/OEIS_Server_Features, but it is not specified with what algorithm this is implemented.

I'm wondering how it is done. Clearly going through nearly a million of sequences for every search is impractical for a search engine. Just keeping an index (which is how the same Russ Cox did Google Code Regex Search) of the first number and brute forcing the rest also doesn't work, as numbers like 0 is in nearly all sequences. In fact some queries like 0 1 match a high percentage of the total database, so the algorithm needs a running time sensitive to the desired output size.

Does anyone happen to know how this feature is implemented?

like image 910
Thomas Ahle Avatar asked Aug 16 '14 13:08

Thomas Ahle


Video Answer


1 Answers

My guess is part of the data is stored in an inverted index. That is each number is linked to a set of series, and when multiple sequences are entered, the set of common sequences is shown. This is extremely fast and used by almost every search engine.

Storing as a suffix trees or any linked data structure is useless for this application.

At least for some set of sequences (eg ax+b), I think it is better to save them parametrically rather than storing the actual sequence.

like image 83
ElKamina Avatar answered Nov 13 '22 01:11

ElKamina