I'm working on a large project, I won't bother to summarize it here, but this section of the project is to take a very large document of text (minimum of around 50,000 words (not unique)), and output each unique word in order of most used to least used (probably top three will be "a" "an" and "the"). My question is of course, what would be the best sorting algorithm to use? I was reading of counting sort, and I like it, but my concern is that the range of values will be too large compared to the number of unique words. Any suggestions?

First, you will need a map of word -> count. 50,000 words is not much - it will easily fit in memory, so there's nothing to worry. In C++ you can use the standard STL std::map. Then, once you have the map, you can copy all the map keys to a vector. Then, sort this vector using a custom comparison operator: instead of comparing the words, compare the counts from the map. (Don't worry about the specific sorting algorithm - your array is not that large, so any standard library sort will work for you.)

I'd start with a quicksort and go from there. Check out the wiki page on sorting algorithms, though, to learn the differences.

Most efficient sorting algorithm for a large set of numbers

Tags:

performance

algorithm

list

sorting

numbers

I'm working on a large project, I won't bother to summarize it here, but this section of the project is to take a very large document of text (minimum of around 50,000 words (not unique)), and output each unique word in order of most used to least used (probably top three will be "a" "an" and "the").

My question is of course, what would be the best sorting algorithm to use? I was reading of counting sort, and I like it, but my concern is that the range of values will be too large compared to the number of unique words.

Any suggestions?

803

asked Jun 05 '09 03:06

aterimperator

2 Answers

First, you will need a map of word -> count. 50,000 words is not much - it will easily fit in memory, so there's nothing to worry. In C++ you can use the standard STL std::map.

Then, once you have the map, you can copy all the map keys to a vector.

Then, sort this vector using a custom comparison operator: instead of comparing the words, compare the counts from the map. (Don't worry about the specific sorting algorithm - your array is not that large, so any standard library sort will work for you.)

answered Nov 15 '22 21:11

Igor Krivokon

I'd start with a quicksort and go from there.

Check out the wiki page on sorting algorithms, though, to learn the differences.

answered Nov 15 '22 23:11

Eric

Related questions
                            
                                Another Algorithm Job-Interview
                            
                                What is the best data-structure to represent a checkers board when speed is the primary concern?
                            
                                Scoring algorithms: how to convert the number & % of "Likes" & "Dislikes" into a single score?
                            
                                Rewriting a recursive function without using recursion
                            
                                Is there an Objective-C algorithm like `transform` of the C++ STL?
                            
                                Choice of algorithm for .indexOf method in Java
                            
                                Mysql count rows using filters on high traffic database
                            
                                Is binary search optimal in worst case?
                            
                                minimum steps required to make array of integers contiguous
                            
                                Array list algorithm - Interview
                            
                                How to do the Bisection method in Python
                            
                                Determine if more than half of the array repeats in a distinct array
                            
                                How can I order a list of connections
                            
                                Largest and smallest number of internal nodes in red-black tree?
                            
                                Can Dijkstra's Single Source Shortest Path Algorithm detect an infinite cycle in a graph?
                            
                                Ranking algorithm using likes / dislikes and average views per day
                            
                                Sequential citation numbering in R: separate numbers by hyphen, if sequential - add comma if not
                            
                                Efficient implementation of log2(__m256d) in AVX2
                            
                                How to detect cycles in a directed graph using the iterative version of DFS?
                            
                                Fastest way to sort an array only using these blackbox functions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With