A lot of sorting algorithms are based on comparisons of numbers. If I correctly understand, when we are using comparison algorithms for alphabetical sort, we compare char codes(their integer presentation) and sort depending on their values. (That's why at ASCII
table letter B
has a bigger code then A
). But during this comparing we sort only by the first letter and not the whole word. When we use db query
with ORDER BY
we are getting sorting for the whole words. (As I understand the reason of it is db background mechanisms like indexes etc.). Also I heard about Radix sort
(sorry, but never used it before) and as I can see it can help with alphabetical sort(maybe I'm wrong).
What algorithm is better to use for sorting by the whole words?
Not correct:
Adam
Aaron
Antony
Correct:
Aaron
Adam
Antony
And am I correct with my assumptions about the whole workflow?
You're not quite correct with the assumption about "Compare only the first letter". The algorithm is - if the first letters are the same, compare the next letter. And the next. And the next. Until either you find some letters that are different, or one of the strings runs out.
Also note that simply comparing by ASCII codes is not always enough. Sometimes you need to do a case-insensitive comparison where you consider A
to be equal to a
. Sometimes you need to do accent-insensitive comparison where you consider ā
to be equal to a
. And sometimes you need to account for crazy language shit where ß
is equal to ss
or worse.
My advice is - your programming language should probably have some mechanism for comparing strings. Use that. Don't roll out your own.
After that, any sorting algorithm will work. They all use one simple assumption - that you can compare the items that you sort. Whether they are integers, strings or complex objects, is irrelevant. As long as you can take any two objects and say "this one is bigger and this one is smaller", you're good to go.
(Note also that you need to be consistent about it. If A==B and B==C, then also you need to make sure that A==C. Similarly if A < B and B < C, then you must A < C. Etc.)
No, the sorting is not based on first character or length. Alphabetical or better to put it as lexicographical ordering are done in the following way,
In C++ the comparison function would look like this,
bool operator<(const string &a, const string &b){
int l = min(a.size(),b.size());
for(int i = 0; i < l; i++){
if( a[i] > b[i]) return false; // a is greater than b
if( b[i] > a[i]) return true; // b is greater than a
}
if ( a.size() > l) return false; // a is greater than b
return true; // b is greater than a
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With