Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort & uniq in Linux shell

What is the difference between the following to commands?

sort -u FILE  sort FILE | uniq 
like image 754
yassin Avatar asked Aug 01 '10 17:08

yassin


People also ask

What do you mean sort?

sorted; sorting; sorts. Definition of sort (Entry 2 of 2) transitive verb. 1a : to put in a certain place or rank according to kind, class, or nature sort apples sort mail. b : to arrange according to characteristics : classify —usually used with out sort out colors.

What is sort and example?

Sorting is a technique to put disarranged elements in either ascending or descending order. For example – [3,4,2,0,78,11] the elements in this array are not in order, after applying sorting the data becomes – [0,2,3,4,11,78].

How do you use sort?

Select any cell within the range you want to sort. On the Data tab, in the Sort & Filter group, click Sort. In the Sort dialog box, click Options. In the Sort Options dialog box, under Orientation, click Sort left to right, and then click OK.

What does out sort mean?

The expression is mostly used in informal contexts. When you say that you are 'out of sorts', you mean that you are not your usual self. You do not feel one hundred per cent okay, and as a result, are grumpy and irritable. *Is something wrong? You look out of sorts today.


1 Answers

Using sort -u does less I/O than sort | uniq, but the end result is the same. In particular, if the file is big enough that sort has to create intermediate files, there's a decent chance that sort -u will use slightly fewer or slightly smaller intermediate files as it could eliminate duplicates as it is sorting each set. If the data is highly duplicative, this could be beneficial; if there are few duplicates in fact, it won't make much difference (definitely a second order performance effect, compared to the first order effect of the pipe).

Note that there times when the piping is appropriate. For example:

sort FILE | uniq -c | sort -n 

This sorts the file into order of the number of occurrences of each line in the file, with the most repeated lines appearing last. (It wouldn't surprise me to find that this combination, which is idiomatic for Unix or POSIX, can be squished into one complex 'sort' command with GNU sort.)

There are times when not using the pipe is important. For example:

sort -u -o FILE FILE 

This sorts the file 'in situ'; that is, the output file is specified by -o FILE, and this operation is guaranteed safe (the file is read before being overwritten for output).

like image 124
Jonathan Leffler Avatar answered Oct 16 '22 13:10

Jonathan Leffler