Finding all pairs of sequences that differ at exactly one position

Question

I need a data structure representing a set of sequences (all of the same, known, length) with the following non-standard operation:

Find two sequences in the set that differ at exactly one index. (Or establish that no such pair exists.)

If N is the length of the sequences and M the number of sequences, there is an obvious O(N*M*M) algorithm. I wonder if there is a standard way of solving this more efficiently. I'm willing to apply pre-processing if needed.

Bonus points if instead of returning a pair, the algorithm returns all sequences that differ at the same point.

Alternatively, I am also interested in a solution where I can check efficiently whether a particular sequence differs at one index with from any sequence in the set. If it helps, we can assume that in the set, no two sequences have that property.

Edit: you can assume N to be reasonably small. By this, I mean improvements such as O(log(N)*M*M) are not immediately useful for my use case.

Falk Hüffner · Accepted Answer

For each sequence and each position i in that sequence, calculate a hash of the sequence without position i and add it to a hash table. If there is already an entry in the table, you have found a potential pair that differs only in one position. Using rolling hashes from both start and end and combining them, you can calculate each hash in constant time. The total running time is expected O(N*M).

Finding all pairs of sequences that differ at exactly one position

Tags:

algorithm

dynamic-programming

Philippe

1 Answers

Falk Hüffner

Recent Activity

Donate For Us

Finding all pairs of sequences that differ at exactly one position

Tags:

algorithm

dynamic-programming

Philippe

1 Answers

Falk Hüffner

Related questions

Recent Activity

Donate For Us