I am now manipulating huge data set. The format is like this:
1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 B 1 1 1 1
1 1 1 1 1 1 1 1 B 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1
'1' can be different. My goal is to identify the two lines with 'B' (three or four consecutive lines with 'B' is possible) and extract these lines with 'B' and their surrounding lines (e.g., the prior two lines with 'A' and the following two lines with 'C'). There are several blocks of this kind and I was considering using for loop to read the file line by line. Every time when I meet an 'A' followed by a 'B' the position is identified. I tried using
for line in file:
if 'A' in line and if 'B' in file.next():
But it seemed some lines were lost. My question is how can I exactly identify A-B (or B-C) line pair using for loop? And after that, how can I easily go backwards (or forwards) several lines to extract all of them within the loop?
The linecache module can get lines from a file by line number. You can use this to mark boundary points (A-B, B-C) as you go through the file, and then loop through the lines to get the output that you want.
import linecache
final_lines = []
with open("file.txt") as f:
for i, line in enumerate(f, 1):
if "B" in line:
if "A" in linecache.getline("file.txt", i-1):
linestart = i - 2 ##2 lines before
if "C" in linecache.getline("file.txt", i+1):
lineend = i + 2 ##2 lines after
for j in range(linestart, lineend+1):
final_lines.append(linecache.getline("file.txt", j))
print(final_lines)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With