Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suffix array nlogn creation

I have been learning suffix arrays creation, & i understand that We first sort all suffixes according to first character, then according to first 2 characters, then first 4 characters and so on while the number of characters to be considered is smaller than 2n.

But my doubt is why don't we choose the first 3 characters, then 9... and so on. Why only 2 characters are taken into account since the strings are a part of same strings and not different random strings?

like image 897
user2565192 Avatar asked Jul 17 '16 07:07

user2565192


1 Answers

I haven't analyzed the suffix array construction algorithm thoroughly, but still would like to share my thoughts.

In my humble opinion, your question is similar to the following ones:

  • Why do computers use binary encoding of information instead of ternary?

  • Why does binary search bisect the range instead of trisecting it?

  • Why are there two sexes rather than three?

The reason is that the number 2 is special - it is the smallest plural number. The difference between 1 and 2 is qualitative, whereas the difference between 2 and 3 (as well as any other positive integer) is quantitative and therefore not as drastic.

As a result, binary formulation of many algorithms and data structures turns out to be the simplest one, though some of them may be generalized, with various degrees of added complexity, for an arbitrary base.

like image 51
Leon Avatar answered Oct 13 '22 05:10

Leon