Is it true that sorting strings is O(n^2logn)? [duplicate]

2 Answers

Think of it this way.

When you are sorting numbers, to detect which number is bigger, you need only 1 comparison.

But when sorting strings, to detect which string is bigger, at times you need to compare all the characters of the string (i.e comparing hello and hella would need 5 char comparisons).

So in that case, every string comparison would take time that is directly proportional to the length of the strings. If the length is consistent, (suppose l), then the complexity would become O(l*nlogn)

Do not get confused between n and l. In any time complexity, n would stand for the number of inputs. In your case, complexity will be O(n^2logn) only when the length of the strings are also n.

Otherwise, taking the length of the strings as l, complexity would be O(l*nlogn)

105

answered Oct 04 '22 02:10

Haris

You extracted that quote out of context; it's worth putting it back into context to understand what's going on.

The context is that we start with some string S of length N, and wish to sort the set of all possible suffixes of that string.

There are N possible non-empty suffixes, one for each character position, and the average size of a string in that set is (N+1)/2 (since every length from 1 to N is equally likely.)

If we assume that the expected cost of comparing two strings of average length L is O(L), and the expected number of comparisons to sort N objects is O(N log N), and finally since L is (N+1)/2, we can see that the expected time to comparison-sort this particular set of strings is O((N+1)/2×N×logN) which simplifies to O(N²logN).

But the expected cost of comparing two strings of length L is not actually O(L), because it is generally the case that it is not necessary to look at every character position in two strings in order to compare them. It is sufficient to look at the characters until the first non-match is found. If the strings are randomly distributed in the universe of possible strings, then the expected number of characters which need to be examined is actually O(1) (by a simple probabilistic argument), although the worst-case cost is O(L).

In the case of quicksort (at least), there is a counter-balance, which is the tendency for comparisons in later phases of the quicksort to be between strings which are closer together. In this case, the number of characters needed to be examined tends to approach log N, which is the expected length of the longest common prefix between two consecutive strings in a sorted uniformly-distributed sample of N strings.

So the actual expected cost of the sort should be O(Nlog²N).

As a further complication, it is possible to modify the quicksort algorithm to track the LCPs of each partition, which might help restore the expected number of comparisons to something smaller than log N.

(A similar argument can be applied to radix sort, which also eliminates the apparent O(L) term from the expected -- but not worst-case -- time. The cited article proceeds to provide a much better algorithm for the suffix tree problem, which avoids comparison-sorting.)

answered Oct 04 '22 01:10

rici

Related questions
                            
                                Sorting 1 trillion integers
                            
                                Calculate Minimum Bounding Rectangle Of 2D Shape By Coordinates
                            
                                Fast algorithm to find the x closest points to a given point on a plane
                            
                                How does the JavaScript sort function work(as an algorithm)? [duplicate]
                            
                                Given an unsorted Array find maximum value of A[j] - A[i] where j>i..in O(n) time
                            
                                Google Interview : Find Crazy Distance Between Strings
                            
                                Scoring System Suggestion - weighted mechanism?
                            
                                Compute rand7() using rand5()
                            
                                How to write function for N-ary tree traversal in Haskell
                            
                                How can I get all possible permutations of a list with Common Lisp?
                            
                                Formally verifying the correctness of an algorithm
                            
                                Count word frequency in a text? [duplicate]
                            
                                Check if array B is a permutation of A
                            
                                Given a sorted integer array, how may Binary Search trees can be formed from it?
                            
                                Sort array of days in javascript
                            
                                Test if python Counter is contained in another Counter
                            
                                Algorithm to find the next number in a sequence
                            
                                Finding the longest border of a string
                            
                                Find maximum value in an array by recursion
                            
                                A simple explanation of Random Forest

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it true that sorting strings is O(n^2logn)? [duplicate]

Tags:

algorithm

big-o

jcm

People also ask

2 Answers

Haris

rici

Recent Activity

Donate For Us