I tried to reorganize the format of a file containing:
>Humanl|chr16:86430087-86430726 | element 1 | positive >Humanl|chr16:85620095-85621736 | element 2 | negative >Humanl|chr16:80423343-80424652 | element 3 | negative >Humanl|chr16:80372593-80373755 | element 4 | positive >Humanl|chr16:79969907-79971297 | element 5 | negative >Humanl|chr16:79949950-79951518 | element 6 | negative >Humanl|chr16:79026563-79028162 | element 7 | negative >Humanl|chr16:78933253-78934686 | element 9 | negative >Humanl|chr16:78832182-78833595 | element 10 | negative
My command is:
awk '{FS="|";OFS="\t"} {print $1,$2,$3,$4,$5}'
Here is the output:
>Human|chr16:86430087-86430726 | element 1 | >Human chr16:85620095-85621736 element 2 negative >Human chr16:80423343-80424652 element 3 negative >Human chr16:80372593-80373755 element 4 positive >Human chr16:79969907-79971297 element 5 negative >Human chr16:79949950-79951518 element 6 negative >Human chr16:79026563-79028162 element 7 negative >Human chr16:78933253-78934686 element 9 negative >Human chr16:78832182-78833595 element 10 negative
Every line works fine except for the first line. I don't understand why this happened.
Can someone help me with it? Thanks!
FS - Field Separator. NF - Number of Fields. NF - Number of Fields. NR - Total Number of Records. OFS - Output Field Separator.
awk Built-in Variables FS - Field SeparatorThe variable FS is used to set the input field separator. In awk , space and tab act as default field separators. The corresponding field value can be accessed through $1 , $2 , $3 ... and so on. -F - command-line option for setting input field separator.
The -f option only controls where the awk program is read from. If enabled, it means that the first filename is in fact the name of a file that contains the awk program. Otherwise, the first filename is the first file to start looking for patterns.
0) then the output record separator is set to the default record separator (RS), which is newline. If the record count is not a multiple of 3 (NR%3 == 0) then the output record separator is set to the default field separator (FS) which is space.
FS
and OFS
are set too late to affect the first line, use something like this instead:
awk '{print $1,$2,$3,$4,$5}' FS='|' OFS='\t'
You can also use this shorter version:
awk -v FS='|' -v OFS='\t' '$1=$1'
It doesn't work because awk has already performed record/field splitting at the time when FS
and OFS
are set. You can force a re-splitting by setting $0
to $0
, e.g.:
awk '{FS="|";OFS="\t";$0=$0} {print $1,$2,$3,$4,$5}'
The conventional ways to do this are 1. set FS
and others in the BEGIN
clause, 2. set them through the -v VAR=VALUE
notation, or 3. append them after the script as VAR=VALUE
. My preferred style is the last alternative:
awk '{print $1,$2,$3,$4,$5}' FS='|' OFS='\t'
Note that there is a significant difference between when -v
and post-script variables are set. -v
will set variables before the BEGIN
clause whilst post-script setting of variables are set just after the BEGIN
clause.
try:
awk 'BEGIN{FS="|";OFS="\t"} {print $1,$2,$3,$4,$5}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With