Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if two lines start with the same character, if so the output average, if not, print actual value

Tags:

bash

shell

awk

I'd like to check if two rows start with the same number in the 1st column, if that happens, then the average of the 2nd column should be displayed. File example:

01  21    6    10%       93.3333%
01  22    50   83.3333%  93.3333%
02  20.5  23   18.1102%  96.8504%
02  21.5  100  78.7402%  96.8504%
03  22.2  0    0%        100%
03  21.2  29   100%      100%
04  22.5  1    5.55556%  100%
04  23.5  17   94.4444%  100%
05  22.7  9    7.82609%  100%
05  21.7  106  92.1739%  100%
06  23    11   17.4603%  96.8254%
06  22    50   79.3651%  96.8254%
07  20.5  14   18.6667%  96%
07  21.5  58   77.3333%  96%
08  21.8  4    100%      100%
09  22.6  0    0%        100%
09  21.6  22   100%      100%

For instance, the two first lines start with 01, but there is only one line starting with 08 (15th line). Therefore, the output based on these two cases should be:

01 21.5
...
...
...
08 21.8
...
...
...

I ended up with the following awk line, which works great when the file always has two similar lines, but it fails using the file shown above (because the 15th line):

awk '{sum+=$2} (NR%2)==0{print sum/2; sum=0;}'

Any hint is welcomed,

like image 576
Gery Avatar asked Sep 25 '15 19:09

Gery


3 Answers

This awk should work:

awk 'function dump(){if (n>0) printf "%s%s%.2f\n", p, OFS, sum/n}
     NR>1 && $1 != p{dump(); sum=n=0} {p=$1; sum+=$2; n++} END{dump()}' file
01 21.5
02 21.0
03 21.7
04 23.0
05 22.2
06 22.5
07 21.0
08 21.8
09 22.1

Explanation: We are using 3 variables:

p -> to hold previous row's $1 value
n -> count of similar $1 values
sum -> is sum of $2 values for similar $1 rows

How it works:

NR>1 && $1 != p     # when row #1 > 1 and prev $1 is not current $1
dump()              # function is to print formatted value of $1 and average
p=$1; sum+=$2; n++  # sets p to $1, adds current $2 to sum and increments n
like image 122
anubhava Avatar answered Sep 19 '22 23:09

anubhava


Using GNU awk

gawk '
    {sum[$1]+=$2; n[$1]++} 
    END {
        PROCINFO["sorted_in"] = "@ind_num_asc"
        for (key in sum) print key, sum[key]/n[key]
    }
' file
01 21.5
02 21
03 21.7
04 23
05 22.2
06 22.5
07 21
08 21.8
09 22.1

The "PROCINFO" line makes the array traversal sorted my index numerically. Otherwise the output would appear random.

like image 33
glenn jackman Avatar answered Sep 17 '22 23:09

glenn jackman


awk with piped sort

awk '{s[$1]+=$2;c[$1]++} END{for(i in s) print i, s[i]/c[i]}' file | sort
like image 20
karakfa Avatar answered Sep 21 '22 23:09

karakfa