Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple/divide columns in bash

I have a dataframe looking like this:

ERR843978.19884 13 51 51
ERR843978.2880 10 49 51
ERR843978.10002 7 48 55
ERR843978.1158 8 45 54
ERR843978.4671 14 62 60
ERR843978.83 15 56 70
ERR843978.9406 8 56 39
ERR843978.8383 12 59 43
ERR843978.8916 6 51 42

and I wish to do this for all lines:

column2/(column3*column4)

and then print out the output in a new file.

I've written a bash script that does it but it's kinda slow, so I'm looking at a more efficient solution (maybe with awk?).

Here's my code

while read line
do
        out0=$(awk '{print $1}' <<< $line)
        out1=$(awk '{print $2}' <<< $line)
        out2=$(awk '{print $3}' <<< $line)
        out3=$(awk '{print $4}' <<< $line)
        out4=`echo "scale=5; ($out1 / ($out2 * $out3))"|bc -l`
        echo "$out0;$out4"
done < $file
like image 362
EvenStar69 Avatar asked Mar 30 '18 09:03

EvenStar69


2 Answers

Yes, awk is quite efficient here:

awk '{ print $2/($3 * $4) }' file > newfile
like image 107
RomanPerekhrest Avatar answered Nov 02 '22 17:11

RomanPerekhrest


If you split the line with read (as @Cyrus suggested, but without the div)

while read -r column1 column2 column3 column4
do
    echo "bc: $column1;$( echo "scale=5; ($column2 / ($column3 * $column4))"|bc )"
done < $file

it would be a bit faster. On my machine 6sec/1000 rows vs. 1.7sec/1000 rows.

With a combination of sed, bc and paste

{
  echo "scale=5;"
  sed -re 's/(.*) ([0-9]+) ([0-9]+) ([0-9]+)/\2 \/ ( \3 * \4 )/' $file
} | bc > $$.tmp
cut -d ' ' -f 1 $file | paste - $$.tmp

it has been done in 1.1sec/100000 rows. That is a factor of ~150 and explains, why while-loop has a bad reputation.

Using ksh93, which allows Floating-point arithmetic you reach similar numbers.

typeset -F5 column2 column3 column4
while read -r column1 column2 column3 column4
do
    printf "printf %s;%.5f\n" "$column1 " "$(( column2 / (column3 * column4) ))"
done < $file

0.9sec/100,000 rows. This reveals, it is not the loop itself, but using an external command bc within the loop.

And yes, awk is still ~8 times faster, 1.4 sec/1,000,000 rows

like image 35
ULick Avatar answered Nov 02 '22 17:11

ULick