I am trying to read a file and sort it by number of occurrences of a particular field. Suppose i want to find out the most repeated date from a log file then i use uniq -c option and sort it in descending order. something like this
uniq -c | sort -nr
This will produce some output like this -
809 23/Dec/2008:19:20
the first field which is actually the count is the problem for me .... i want to get ony the date from the above output but m not able to get this. I tried to use cut command and did this
uniq -c | sort -nr | cut -d' ' -f2
but this just prints blank space ... please can someone help me on getting the date only and chop off the count. I want only
23/Dec/2008:19:20
Thanks
The count from uniq
is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:
uniq -c | sort -nr | cut -c 9-
to get columns (character positions) 9 upwards. Or you can use sed
:
uniq -c | sort -nr | sed 's/^.\{8\}//'
or:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
This second option is robust in the face of a repeat count of 10,000,000 or more; if you think that might be a problem, it is probably better than the cut
alternative. And there are undoubtedly other options available too.
Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq
from coreutils
8.3. The BSD uniq -c
produced 3 leading spaces before a single digit count. The POSIX spec says the output from uniq -c
shall be formatted as if with:
printf("%d %s", repeat_count, line);
which would not have any leading blanks. Given this possible variance in output formats, the sed
script with the [0-9]
regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c
:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
Instead of cut -d' ' -f2
, try
awk '{$1="";print}'
Maybe you need to remove one more blank in the beginning:
awk '{$1="";print}' | sed 's/^.//'
or completly with sed, preserving original whitspace:
sed -r 's/^[^0-9]*[0-9]+//'
Following awk
may help you here.
awk '{a[$0]++} END{for(i in a){print a[i],i | "sort -k2"}}' Input_file
Solution 2nd: In case you want order of output to be same as input but not as sort.
awk '!a[$0]++{b[++count]=$0} {c[$0]++} END{for(i=1;i<=count;i++){print c[b[i]],b[i]}}' Input_file
an alternative solution is this:
uniq -c | sort -nr | awk '{print $1, $2}'
also you may easily print a single field.
use(since you use -f2 in the cut in your question)
cat file |sort |uniq -c | awk '{ print $2; }'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With