I've been puzzled with this when I saw the following files listed by ls
in strange order:
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
From human perspective 'I' should go first, then 'II' and so on.
so I created file with the following content:
$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
if I sort it it gives me this:
$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
However, if I remove '-' and everything after it sorts correct:
$ cat 1
Star Wars Episode II
Star Wars Episode III
Star Wars Episode I
Star Wars Episode IV
Star Wars Episode VI
Star Wars Episode V
$ sort 1
Star Wars Episode I
Star Wars Episode II
Star Wars Episode III
Star Wars Episode IV
Star Wars Episode V
Star Wars Episode VI
So, as soon as I add any symbol after space it starts sorting unpredictable for me:
$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u
$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u
Any hint on this sort behaviour ?
Update: sort: using ‘en_CA.UTF-8’ sorting rules
update #2 as per comment below it is because of locale.
ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Why then UTF8 locale makes it different ? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (proper sorting)
Update #3 It is about locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
Some Common Sorting Algorithms Some of the most common sorting algorithms are: Selection sort. Bubble sort. Insertion sort.
Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. ordering: arranging items in a sequence ordered by some criterion; categorizing: grouping items with similar properties.
The heapsort algorithm involves preparing the list by first turning it into a max heap. The algorithm then repeatedly swaps the first value of the list with the last value, decreasing the range of values considered in the heap operation by one, and sifting the new first value into its position in the heap.
Issues in SortingWe need to consider whether we need to sort the list in increasing or decreasing order. Clearly we can use the same algorithm in both cases. All we need to do is to change the comparison criteria from > to < or vice versa.
I think I found the proper explanation of this:
Gnu coreutils FAQ: Sort does not sort in normal order
Found it on: sort not sorting as expected (space and locale)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With