I have a bunch of files using the format file.1.a.1.txt that look like this:
A 1
B 2
C 3
D 4
and was using the following command to add a new column containing the name of each file:
awk '{print FILENAME (NF?"\t":"") $0}' file.1.a.1.txt > file.1.a.1.txt
which ended up making them look how I want:
file.1.a.1.txt A 1
file.1.a.1.txt B 2
file.1.a.1.txt C 3
file.1.a.1.txt D 4
However, I need to do this for multiple files as a job on an HPC using sbatch submission. But when I run the following job script:
#!/bin/bash
#<other SBATCH info>
#SBATCH --array=1-10
N=$SLURM_ARRAY_TASK_ID
for j in {a,b,c};
do
for i in {1,2,3}
do awk '{print FILENAME (NF?"\t":"") $0}' file.${N}."$j"."$i".txt > file.${N}."$j"."$i".txt
done
done
awk is generating empty files. I have tried using cat to call the file and then piping it to awk but that also hasn't worked.
You don't need a loop and you cannot redirect STDOUT to the same file you're reading from STDIN, you will get blank files if you do that.
Try this:
#!/bin/bash
N=$SLURM_ARRAY_TASK_ID
awk '
NF{
print FILENAME "\t" $0 > FILENAME".tmp"
}
ENDFILE{ # requires gawk
close(FILENAME".tmp")
}' file."$N".{a,b,c}.{1,2,3}.txt
for file in file*.tmp; do
mv "$file" "${file%.tmp}"
done
Note that if you don't have GNU awk to use ENDFILE{} you can remove that stanza and get away with either:
close() statement just after the print statement (comes with lots of overhead)close() at all and as long as you don't have a lot of files, you should be fine.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With