Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

faster string sorting with long common prefix?

I have a set of strings. 90% of them are URLs start with "http://www.". I want to sort them alphabetically.

Currently I use C++ std::sort(). but std::sort is a variant of quick-sort based on comparison, and comparing two strings with long common prefix is not effecient. However (I think) a radix-sort won't work either, since most strings are put in the same bucket because of long common prefix.

Is there any better algorithm than normal quick-sort/radix-sort for this problem?

like image 661
richselian Avatar asked Apr 26 '13 22:04

richselian


People also ask

What is the fastest algorithm to sort a string?

In all cases tested, burstsort is the fastest string sorting method; for the largest data set, it is almost twice as fast as the best of the previous methods—and almost four times as fast as an efficient implemen- tation of quicksort (Bentley & McIlroy 1993).

What is the longest common prefix?

The longest common prefix for an array of strings is the common prefix between 2 most dissimilar strings. For example, in the given array {“apple”, “ape”, “zebra”}, there is no common prefix because the 2 most dissimilar strings of the array “ape” and “zebra” do not share any starting characters.


2 Answers

I would suspect that the processing time you spend trying to exploit common prefixes on the order of 10 characters per URL doesn't even pay for itself when you consider the average length of URLs.

Just try a completely standard sort. If that's not fast enough, look at parallelizing or distributing a completely standard sort. It's a straightforward approach that will work.

like image 125
Timothy Shields Avatar answered Sep 20 '22 13:09

Timothy Shields


Common Prefixes seem to naturally imply that a trie data structure could be useful. So the idea is to build a trie of all the words and then sort each node. The ordering should be that the children of a particular node reside in a list and are sorted. This can be done easily since at a particular node we need only sort the children, so naturally a recursive solution reveals itself. See this for more inspiration: http://goanna.cs.rmit.edu.au/~jz/fulltext/acsc03sz.pdf

like image 26
Anil Vaitla Avatar answered Sep 18 '22 13:09

Anil Vaitla