Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can't explain sort(1) behaviour

Tags:

linux

sorting

ls

I've been puzzled with this when I saw the following files listed by ls in strange order:

Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv

From human perspective 'I' should go first, then 'II' and so on.

so I created file with the following content:

$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

if I sort it it gives me this:

$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The

However, if I remove '-' and everything after it sorts correct:

$ cat 1
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode I 
Star Wars Episode IV 
Star Wars Episode VI 
Star Wars Episode V 

$ sort 1
Star Wars Episode I 
Star Wars Episode II 
Star Wars Episode III 
Star Wars Episode IV 
Star Wars Episode V 
Star Wars Episode VI 

So, as soon as I add any symbol after space it starts sorting unpredictable for me:

$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u

$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u

Any hint on this sort behaviour ?

Update: sort: using ‘en_CA.UTF-8’ sorting rules

update #2 as per comment below it is because of locale.

ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv

Why then UTF8 locale makes it different ? I checked with ru_RU.UTF8 (incorrect sorting) and ru_RU.KOI8-R (proper sorting)

Update #3 It is about locale: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

like image 287
stimur Avatar asked Dec 05 '13 16:12

stimur


People also ask

What are the 3 types of sorts?

Some Common Sorting Algorithms Some of the most common sorting algorithms are: Selection sort. Bubble sort. Insertion sort.

How do you explain sorting?

Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items. ordering: arranging items in a sequence ordered by some criterion; categorizing: grouping items with similar properties.

How does heapsort work?

The heapsort algorithm involves preparing the list by first turning it into a max heap. The algorithm then repeatedly swaps the first value of the list with the last value, decreasing the range of values considered in the heap operation by one, and sifting the new first value into its position in the heap.

What are the issues in sorting?

Issues in SortingWe need to consider whether we need to sort the list in increasing or decreasing order. Clearly we can use the same algorithm in both cases. All we need to do is to change the comparison criteria from > to < or vice versa.


1 Answers

I think I found the proper explanation of this:

Gnu coreutils FAQ: Sort does not sort in normal order

Found it on: sort not sorting as expected (space and locale)

like image 84
stimur Avatar answered Sep 30 '22 05:09

stimur