I'm looking at the pseudo-code given in figure 3 of the original paper introducing suffix arrays "SUFFIX ARRAYS: A NEW METHOD FOR ON-LINE STRING SEARCHES".
I can't figure out the logic for lines 4 and 5 (indexing from 0). The lines reads:
else if r < P or wr ≤ aPos[N-1]+rthen
LW ← N
W
is the pattern of length 'P' we're looking for and r
is lcp(A[pos[N-1]:], W)
. The problem is that in almost all cases, this lcp
will be less than length of 'W'. This conditional is meant to handle the case (I think) that the pattern is lexicographically larger than the lexicographically greatest suffix in the array, but it doesn't test this at all. Lines 2 & 3, on the other hand, which test if W
is less than the lexicographically smallest suffix seem to make perfect sense
if l = P or wl ≤ aPos[0]+lthen
LW ← 0
I believe that the original lines should read something like:
else if r < P and wr > aPos[N-1]+rthen
LW ← N
The only way that W
can be greater than A[pos[N-1]:]
is if it has an lcp
shorter than the length of the pattern (otherwise, all of W
matches and so W
cannot be greater, only lesser or equal to the thing with which we're sharing the lcp
) AND if the character after the lcp
is greater in W
than in A[pos[N-1]]
. Does this seem to make sense? Is this an error in the original paper? If not, can someone please explain to me how I'm misinterpreting the original code?
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics. Suffix array.
A suffix array can be constructed from Suffix tree by doing a DFS traversal of the suffix tree. In fact Suffix array and suffix tree both can be constructed from each other in linear time. A simple method to construct suffix array is to make an array of all suffixes and then sort the array.
I think you understand the paper correct and in fact it has an error.
Consider the following example: let A = banana
, W = nana
. Then A[pos[N-1]:] = nana
. Algorithm sets LW = N
or even fails while in fact it's N-1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With