Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting the lines in a CSV according to those containing numbers and those without

I have a 4-column CSV file. I want to sort the lines, such that, the lines containing any number somewhere within the third column are pushed to the end of the document, lines without numbers in the third column are put to the beginning. How can I sort the file in this way?

Update:

To clarify, I need to move lines which contain any number (i.e. a match for [0-9]) somewhere within the letters of the third column (the third column of the line might contain other symbols). Spaces are not important. E.g.

dog, eats chicken, has 4 legs, does not like cats
cat, eats mice, has a tail, does not like water
mouse, eats bugs, has 4 legs, does not like cats
elephant, eats peanuts, has a trunk, does not like mice

Would be sorted to:

cat, eats mice, has a tail, does not like water
elephant, eats peanuts, has a trunk, does not like mice
dog, eats chicken, has 4 legs, does not like cats
mouse, eats bugs, has 4 legs, does not like cats
like image 484
Village Avatar asked Jan 24 '12 09:01

Village


1 Answers

Something like this should work:

awk 'BEGIN {FS=","; OFS=","}; {print match($3,/[0-9]/), $0}' <file> | sort | cut -d, -f2-

The strategy is to

  • use awk to insert at the beginning of each line the index in which a digit is found (or 0 if no digit is found)
  • use sort to sort all the lines
  • finally, use cut to remove the number that was prepended by awk.
like image 97
jcollado Avatar answered Sep 29 '22 22:09

jcollado