I have a large file like below, I want to split this file into multiple files. Each file should be break after ENDMDL. For the following file there will be three output files with name pose1.av, pose2.av and pose3.av.
MODEL 1
SML 170 O PRO A 17 16.893 3.030 0.799 1.00 1.00 O
SML 171 OXT PRO A 17 18.167 2.722 2.597 1.00 1.00 O
TER 172 PRO A 17
ENDMDL
MODEL 2
SML 4 CG ARG A 1 -2.171 -7.105 -4.278 1.00 1.00 C
SML 5 CD ARG A 1 -1.851 -8.581 -4.022 1.00 1.00 C
SML 113 HD1 HIS A 12 2.465 -8.206 5.062 1.00 1.00 H
TER 114 HIS A 12
ENDMDL
MODEL 3
SML 101 N HIS A 12 3.765 -3.995 7.233 1.00 1.00 N
SML 102 CA HIS A 12 2.584 -4.736 6.934 1.00 1.00 C
TER 103 HIS A 12
ENDMDL
A rather efficient one, using bash and sed:
n=0
while IFS= read -r firstline; do
{ echo "$firstline"; sed '/^ENDMDL$/q'; } > "pose$((++n)).av"
done < file
It's much more efficient than the other Bash answer: the output file is only opened once, and most of the parsing is done by sed, and not by bash.
csplit can do this out of the box
csplit -z -s -f pose -b "%01d.av" file '/^ENDMDL$/+1' '{*}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With