Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use awk to find average of a column [duplicate]

Tags:

bash

awk

I'm attempting to find the average of the second column of data using awk for a class. This is my current code, with the framework my instructor provided:

#!/bin/awk  ### This script currently prints the total number of rows processed. ### You must edit this script to print the average of the 2nd column ### instead of the number of rows.  # This block of code is executed for each line in the file { x=sum read name         awk 'BEGIN{sum+=$2}'         # The script should NOT print out a value for each line } # The END block is processed after the last line is read END {         # NR is a variable equal to the number of rows in the file         print "Average: " sum/ NR         # Change this to print the Average instead of just the number of rows } 

and I'm getting an error that says:

awk: avg.awk:11:        awk 'BEGIN{sum+=$2}' $name awk: avg.awk:11:            ^ invalid char ''' in expression 

I think I'm close but I really have no idea where to go from here. The code shouldn't be incredibly complex as everything we've seen in class has been fairly basic. Please let me know.

like image 400
Ben Zifkin Avatar asked Oct 03 '13 02:10

Ben Zifkin


People also ask

How do you find the average of a column in awk?

4 Answers. Show activity on this post. Add the numbers in $2 (second column) in sum (variables are auto-initialized to zero by awk ) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.

What does $1 $2 indicate in awk file?

Awk works by scanning through each line of text (or record) in the file and carrying out any instructions you tell it on that line. In awk we access fields using syntax like: $1 or $2. $1 indicates that you are referring to the first field or first column.

How do you use NR in awk?

NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file. NF: NF command keeps a count of the number of fields within the current input record.


1 Answers

awk '{ sum += $2; n++ } END { if (n > 0) print sum / n; }' 

Add the numbers in $2 (second column) in sum (variables are auto-initialized to zero by awk) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.

awk '{ sum += $2 } END { if (NR > 0) print sum / NR }' 

If you want to use the shebang notation, you could write:

#!/bin/awk  { sum += $2 } END { if (NR > 0) print sum / NR } 

You can also control the format of the average with printf() and a suitable format ("%13.6e\n", for example).

You can also generalize the code to average the Nth column (with N=2 in this sample) using:

awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }' 
like image 72
Jonathan Leffler Avatar answered Sep 25 '22 05:09

Jonathan Leffler