Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently sum two columns in a file with 270,000+ rows in bash

I have two columns in a file, and I want to automate summing both values per row

for example

read write
5    6
read write
10   2
read write
23   44

I want to then sum the "read" and "write" of each row. Eventually after summing, I'm finding the max sum and putting that max value in a file. I feel like I have to use grep -v to rid of the column headers per row, which like stated in the answers, makes the code inefficient since I'm grepping the entire file just to read a line.

I currently have this in a bash script (within a for loop where $x is the file name) to sum the columns line by line

lines=`grep -v READ $x|wc -l | awk '{print $1}'`
line_num=1
arr_num=0


while [ $line_num -le $lines ]
do

    arr[$arr_num]=`grep -v READ $x |  sed $line_num'q;d' | awk '{print $2 + $3}'`
    echo $line_num
    line_num=$[$line_num+1]
    arr_num=$[$arr_num+1]

done

However, the file to be summed has 270,000+ rows. The script has been running for a few hours now, and it is nowhere near finished. Is there a more efficient way to write this so that it does not take so long?

like image 636
Emil Avatar asked Mar 26 '14 18:03

Emil


1 Answers

Use awk instead and take advantage of modulus function:

awk '!(NR%2){print $1+$2}' infile
like image 56
Juan Diego Godoy Robles Avatar answered Nov 25 '22 06:11

Juan Diego Godoy Robles