faster string sorting with long common prefix?

Tags:

I have a set of strings. 90% of them are URLs start with "http://www.". I want to sort them alphabetically.

Currently I use C++ std::sort(). but std::sort is a variant of quick-sort based on comparison, and comparing two strings with long common prefix is not effecient. However (I think) a radix-sort won't work either, since most strings are put in the same bucket because of long common prefix.

Is there any better algorithm than normal quick-sort/radix-sort for this problem?

661

asked Apr 26 '13 22:04

richselian

2 Answers

I would suspect that the processing time you spend trying to exploit common prefixes on the order of 10 characters per URL doesn't even pay for itself when you consider the average length of URLs.

Just try a completely standard sort. If that's not fast enough, look at parallelizing or distributing a completely standard sort. It's a straightforward approach that will work.

125

answered Sep 20 '22 13:09

Timothy Shields

Common Prefixes seem to naturally imply that a trie data structure could be useful. So the idea is to build a trie of all the words and then sort each node. The ordering should be that the children of a particular node reside in a list and are sorted. This can be done easily since at a particular node we need only sort the children, so naturally a recursive solution reveals itself. See this for more inspiration: http://goanna.cs.rmit.edu.au/~jz/fulltext/acsc03sz.pdf

answered Sep 18 '22 13:09

Anil Vaitla

Related questions
                            
                                C# Convert decimal to string with specify format
                            
                                Python/Pandas: How to Match List of Strings with a DataFrame column
                            
                                Make a utf-8 string shorter with a utf-32 encoding in Javascript?
                            
                                How to capitalize first letter in strings that may contain numbers
                            
                                Remove the rows from pandas dataframe, that has sentences longer than certain word length
                            
                                How do I convert C# characters to their hexadecimal code representation
                            
                                How can I ignore accents when comparing strings in Perl?
                            
                                What's the best way to remove Registered, Trademark, and Copyright symbols from a string?
                            
                                Comparison of substring operation performance between .NET and Java
                            
                                Is there a way to pad to an even number of digits?
                            
                                Security of string resources
                            
                                Inferring templates from a collection of strings
                            
                                Converting a String to a Short
                            
                                C++ and UTF8 - Why not just replace ASCII?
                            
                                String Fraction to Double
                            
                                How to include an enum value in a const string?
                            
                                Parse String to Double.NaN
                            
                                How to parse an arbitrary option string into a python dictionary
                            
                                Assigning char array a value in C
                            
                                StrSubstitutor replacement with JRE libraries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

faster string sorting with long common prefix?

Tags:

string

algorithm

sorting

data-structures

richselian

People also ask

2 Answers

Timothy Shields

Anil Vaitla

Recent Activity

Donate For Us