I'm trying to extract IP addresses from my apache log, count them, and sort them.
And for whatever reason, the sorting part is horrible.
Here is the command:
cat access.* | awk '{ print $1 }' | sort | uniq -c | sort -n
Output example:
16789 65.X.X.X 19448 65.X.X.X 1995 138.X.X.X 2407 213.X.X.X 2728 213.X.X.X 5478 188.X.X.X 6496 176.X.X.X 11332 130.X.X.X
I don't understand why these values aren't really sorted. I've also tried to remove blanks at the start of the line (sed 's/^[\t ]*//g'
) and using sort -n -t" " -k1
, which doesn't change anything.
Any hint ?
Approach: The idea is to use a custom comparator to sort the given IP addresses. Since IPv4 has 4 octets, we will compare the addresses octet by octet. Check the first octet of the IP Address, If the first address has a greater first octet, then return True to swap the IP address, otherwise, return False.
This may be late, but using the numeric in the first sort will give you the desired result,
cat access.log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20
Output:
29877 93.xxx.xxx.xxx 17538 80.xxx.xxx.xxx 5895 198.xxx.xxx.xxx 3042 37.xxx.xxx.xxx 2956 208.xxx.xxx.xxx 2613 94.xxx.xxx.xxx 2572 89.xxx.xxx.xxx 2268 94.xxx.xxx.xxx 1896 89.xxx.xxx.xxx 1584 46.xxx.xxx.xxx 1402 208.xxx.xxx.xxx 1273 93.xxx.xxx.xxx 1054 208.xxx.xxx.xxx 860 162.xxx.xxx.xxx 830 208.xxx.xxx.xxx 606 162.xxx.xxx.xxx 545 94.xxx.xxx.xxx 480 37.xxx.xxx.xxx 446 162.xxx.xxx.xxx 398 162.xxx.xxx.xxx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With