Goal: To print the difference between two fields separated by semicolons ($3
and $2
) and add an integer (+1
) to that calculated value at the end of each line beginning with ">
".
Representative sample of my file:
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product
MFLLHYYLIIQVI
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product
MQWIKDKVLIK
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product
MFYPLYLDYLYY
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP
Desired Output:
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product:111
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product:42
MFLLHYYLIIQVI
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product:36
MQWIKDKVLIK
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product:39
MFYPLYLDYLYY
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial:90
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP
My current awk
script gets me very close by printing the difference between $3
and $2
at the end of each line, but does not include the +1
addition step (required) and is not specific to lines beginning with ">
", despite my attempt with /^ *>/
(not required, but nice):
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4=$3-$2} $4<0 {$4=-$4} 1' file
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product:110
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR:::0
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product:41
MFLLHYYLIIQVI:::0
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product:35
MQWIKDKVLIK:::0
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product:38
MFYPLYLDYLYY:::0
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial:89
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP:::0
Attempts to add the integer (+1
) to the difference calculation:
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4+1=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {$4+=1=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
$ awk -F":" -v n=1 'BEGIN {OFS=FS} /^ *>/ {$4+n=$3-$2} $4<0 {$4=-$4} 1' file
awk: line 1: syntax error at or near =
And although I'm not sure how to implement functions using awk
, I think there could be some utility in using something similar to this:
$ function add_one (number) {
return number + 1
}
$ awk -F":" 'BEGIN {OFS=FS} /^ *>/ {add_one($4)=$3-$2} $4<0 {$4=-$4} 1' file
While I have been attempting to use awk
to solve this problem, I am interested in any solution (e.g., since I am attempting to perform this calculation line-by-line, perhaps there is a more efficient solution with sed
?).
Partial sums Numbers of this form are called triangular numbers, because they can be arranged as an equilateral triangle. The infinite sequence of triangular numbers diverges to +∞, so by definition, the infinite series 1 + 2 + 3 + 4 + ⋯ also diverges to +∞.
Select a cell next to the numbers you want to sum, click AutoSum on the Home tab, press Enter, and you're done. When you click AutoSum, Excel automatically enters a formula (that uses the SUM function) to sum the numbers.
Natural numbers are counting numbers only starting from 1. The sum of natural numbers 1 to 100 is 5050.
Here is an alternative awk
solution that should work on all awk
versions:
awk 'BEGIN {FS=OFS=":"} /^>/ {
v3=$3+0
diff = 1 + (v3 > $2 ? v3-$2 : $2-v3)
$0 = $0 OFS diff
} 1' file
>lcl|ORF1_ 17609 17804 (+):21:131 unnamed protein product:111
MEKVKNKFDENDIKVPFVPSSLLFNNTGNLNTMDKR
>lcl|ORF2_ 17609 17804 (+):70:111 unnamed protein product:42
MFLLHYYLIIQVI
>lcl|ORF3_ 17609 17804 (+):112:147 unnamed protein product:36
MQWIKDKVLIK
>lcl|ORF4_ 17609 17804 (+):129:91 unnamed protein product:39
MFYPLYLDYLYY
>lcl|ORF5_ 17609 17804 (+):90:1 unnamed protein product, partial:90
MIMKKEQMELLYHSHQIYFLPFPLHQNIHP
PS: Make sure to remove DOS line breaks from your input file before running this awk
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With