I have a file data.txt like this.
>1BN5.txt
207
208
211
>1B24.txt
88
92
I have a folder F1 that contains text files.
1BN5.txt file in F1 folder is shown below.
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 422 C SER A 248 70.124 -29.955 8.226 1.00 55.81 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
ATOM 626 N MET B 87 1.054 -3.071 -5.633 1.00 10.00 N
ATOM 627 CA MET B 87 -0.213 -2.354 -5.826 1.00 10.00 C
1B24.txt file in F1 folder is shown below.
ATOM 630 CB MET B 87 -0.476 -2.140 -7.318 1.00 10.00 C
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
ATOM 644 CA ALA B 94 -2.560 -5.149 -4.675 1.00 10.00 C
I need only the lines containing 207,208,211(6th column)in 1BN5.txt file. I want to delete other lines in 1BN5.txt file. Like this, I need only the lines containing 88,92 in 1B24.txt file.
Desired output
1BN5.txt file
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
1B24.txt file
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
Here's one way using GNU awk
. Run like:
awk -f script.awk data.txt
Contents of script.awk
:
/^>/ {
file = substr($1,2)
next
}
{
a[file][$1]
}
END {
for (i in a) {
while ( ( getline line < ("./F1/" i) ) > 0 ) {
split(line,b)
for (j in a[i]) {
if (b[6]==j) {
print line > "./F1/" i ".new"
}
}
}
system(sprintf("mv ./F1/%s.new ./F1/%s", i, i))
}
}
Alternatively, here's the one-liner:
awk '/^>/ { file = substr($1,2); next } { a[file][$1] } END { for (i in a) { while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,b); for (j in a[i]) if (b[6]==j) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
If you have an older version of awk
, older than GNU Awk 4.0.0
, you could try the following. Run like:
awk -f script.awk data.txt
Contents of script.awk:
/^>/ {
file = substr($1,2)
next
}
{
a[file]=( a[file] ? a[file] SUBSEP : "") $1
}
END {
for (i in a) {
split(a[i],b,SUBSEP)
while ( ( getline line < ("./F1/" i) ) > 0 ) {
split(line,c)
for (j in b) {
if (c[6]==b[j]) {
print line > "./F1/" i ".new"
}
}
}
system(sprintf("mv ./F1/%s.new ./F1/%s", i, i))
}
}
Alternatively, here's the one-liner:
awk '/^>/ { file = substr($1,2); next } { a[file]=( a[file] ? a[file] SUBSEP : "") $1 } END { for (i in a) { split(a[i],b,SUBSEP); while ( ( getline line < ("./F1/" i) ) > 0 ) { split(line,c); for (j in b) if (c[6]==b[j]) print line > "./F1/" i ".new" } system(sprintf("mv ./F1/%s.new ./F1/%s", i, i)) } }' data.txt
Please note that this script does exactly as you describe. It expects files like 1BN5.txt
and 1B24.txt
to reside in the folder F1
in the present working directory. It will also overwrite your original files. If this is not the desired behavior, drop the system()
call. HTH.
Results:
Contents of F1/1BN5.txt
:
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 421 CA SER A 207 68.627 -29.819 8.533 1.00 50.79 C
ATOM 615 H LEU B 208 3.361 -5.394 -6.021 1.00 10.00 H
ATOM 616 HA LEU B 211 2.930 -4.494 -3.302 1.00 10.00 H
Contents of F1/1B24.txt
:
ATOM 631 CG MET B 88 -0.828 -0.688 -7.575 1.00 10.00 C
ATOM 632 SD MET B 88 -2.380 -0.156 -6.830 1.00 10.00 S
ATOM 643 N ALA B 92 -1.541 -4.371 -5.366 1.00 10.00 N
Don't try to delete lines from the existing file, try to create a new file with only the lines you want to have:
cat 1bn5.txt | awk '$6 == 207 || $6 == 208 || $6 == 211 { print }' > output.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With