I have this test.txt file :
gene 1:362273700-362275735
exon 1:362275166-362275246
exon 1:362274811-362275058
exon 1:362274230-362274685
gene 1:362279796-362287281
exon 1:362279796-362280179
exon 1:362280576-362280662
exon 1:362280858-362280958
exon 1:362281056-362281106
I need to get this output :
gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106
-> Actually, I need to remove the "gene" lines, and replace each "exon" lines with "gene-X" (where X starts by 1).
I struggle with that.
awk '$1~/exon/ {print $0 (/^exon/ ? "-" (++c) : "")}' test.txt
exon 1:362275166-362275246-1
exon 1:362274811-362275058-2
exon 1:362274230-362274685-3
exon 1:362279796-362280179-4
exon 1:362280576-362280662-5
exon 1:362280858-362280958-6
exon 1:362281056-362281106-7
awk '$1~/exon/ {$1=$1 "-" (++count[$1])}1' test.txt
gene 1:362273700-362275735
exon-1 1:362275166-362275246
exon-2 1:362274811-362275058
exon-3 1:362274230-362274685
gene 1:362279796-362287281
exon-4 1:362279796-362280179
exon-5 1:362280576-362280662
exon-6 1:362280858-362280958
exon-7 1:362281056-362281106
Assuming the counter is based solely on the existence of the string gene
in the 1st column ...
One awk
idea:
awk '
$1=="gene" { cnt++; next }
$1=="exon" { $1="gene-" cnt; print }
' test.txt
This generates:
gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With