I want to sort a file on multiple fields and multiple field separator. Please help. Here is my sample data file:
$ cat Data3
My Text|50002/100/43
My Message|50001/100/7
Help Text|50001/100/7
Help Message|50002/100/11
Text Message|50001/100/63
Visible Text|50001/100/52
Invisible Text|50002/100/1
First field separator is a pipe symbol and second field separator is /
. I want to sort this data on second field first and then within that the data should be in sorted order of the last field (separate by /
). Finally my sorted data should look like this:
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43
By using sort -k2,2n -t'|'
, I am able to sort on field 2 (50001/50002
), but then within that value how can I sort on the last field (separated by /
)?
To sort by a delimiter pass the -t option to sort along with the delimiter value. For a CSV file this would be , . This can be combined with the -k option to sort on fields within a CSV. The result will be written to standard output.
Tip: To sort more than four records at once, use the CTRL key to select multiple fields, right-click, then choose to sort in ascending or descending order. The fields are sorted in the order you click them in the table.
Use the -k option to sort on a certain column. For example, use " -k 2 " to sort on the second column. In old versions of sort, the +1 option made the program sort on the second column of data ( +2 for the third, etc.).
-k Option: Unix provides the feature of sorting a table on the basis of any column number by using -k option. Use the -k option to sort on a certain column. For example, use “-k 2” to sort on the second column.
The simplest trick for this data set is to treat the second column is a version number.
$ cat Data3 | sort -k2,2V -t'|'
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43
However, that doesn't always work depending on your input. This will work because the values in the second column are the same.
You could do what fedorqui suggested and run sort twice and the second time you do a stable sort. From the manpage: -s, --stable (stabilize sort by disabling last-resort comparison)
First sort on the secondary sort criteria. Then do a stable sort, which is keep the sort order within the rows that share the same key from the primary sort criteria.
$ cat Data3 | sort -k3,3n -t'/' | sort -k2,2n -t'|' -s
Help Text|50001/100/7
My Message|50001/100/7
Visible Text|50001/100/52
Text Message|50001/100/63
Invisible Text|50002/100/1
Help Message|50002/100/11
My Text|50002/100/43
You are a bit lucky in this case since -k2,2n -t'|' will treat the second column "50001/100/7" as a number, which will probably be 50001. You could end up in weird situations if that would be comma-separated instead of slash and you were using a different locale in your environment. For instance, default in my environment I run en_US.UTF-8 which behaves like this.
$ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=en_US.UTF-8 sort -k2,2n -t'|' -s
Help Text|50001,100,7
My Message|50001,100,7
Invisible Text|50002,100,1
Visible Text|50001,100,52
Text Message|50001,100,63
Help Message|50002,100,11
My Text|50002,100,43
What you would expect is this:
$ cat Data3 | tr '/' ',' | sort -k3,3n -t',' | LC_NUMERIC=C sort -k2,2n -t'|' -s
Help Text|50001,100,7
My Message|50001,100,7
Visible Text|50001,100,52
Text Message|50001,100,63
Invisible Text|50002,100,1
Help Message|50002,100,11
My Text|50002,100,43
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With