Parsing column values

Question

I have this test.txt file :

gene 1:362273700-362275735
exon 1:362275166-362275246
exon 1:362274811-362275058
exon 1:362274230-362274685
gene 1:362279796-362287281
exon 1:362279796-362280179
exon 1:362280576-362280662
exon 1:362280858-362280958
exon 1:362281056-362281106

I need to get this output :

gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106

-> Actually, I need to remove the "gene" lines, and replace each "exon" lines with "gene-X" (where X starts by 1).

I struggle with that.

awk '$1~/exon/ {print $0 (/^exon/ ? "-" (++c) : "")}' test.txt

exon 1:362275166-362275246-1
exon 1:362274811-362275058-2
exon 1:362274230-362274685-3
exon 1:362279796-362280179-4
exon 1:362280576-362280662-5
exon 1:362280858-362280958-6
exon 1:362281056-362281106-7

awk '$1~/exon/ {$1=$1 "-" (++count[$1])}1' test.txt

gene 1:362273700-362275735
exon-1 1:362275166-362275246
exon-2 1:362274811-362275058
exon-3 1:362274230-362274685
gene 1:362279796-362287281
exon-4 1:362279796-362280179
exon-5 1:362280576-362280662
exon-6 1:362280858-362280958
exon-7 1:362281056-362281106

markp-fuso · Accepted Answer

Assuming the counter is based solely on the existence of the string gene in the 1st column ...

One awk idea:

awk '
$1=="gene" { cnt++; next }
$1=="exon" { $1="gene-" cnt; print }
' test.txt

This generates:

gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106

Parsing column values

Tags:

awk

pedro

1 Answers

markp-fuso

Recent Activity

Donate For Us

Parsing column values

Tags:

awk

pedro

1 Answers

markp-fuso

Related questions

Recent Activity

Donate For Us