I have two files as shown below:
File 1 (tab delimited):
A1 someinfo1 someinfo2 someinfo3 A1 someinfo1 someinfo2 someinfo3 B1 someinfo1 someinfo2 someinfo3 B1 someinfo1 someinfo2 someinfo3
File 2 (tab delimited):
A1 newinfo1 newinfo2 newinfo3 A1 newinfo1 newinfo2 newinfo3 B1 newinfo1 newinfo2 newinfo3 B1 newinfo1 newinfo2 newinfo3
I want to read two lines together (lines starting with A1 and A1) from File 1 and two lines (lines starting with A1 and A1) from File 2. To be more clear, I have two requirements:
1) Reading two lines from the same file 2) Read same two lines from the other file.
To be precise, I want to read four lines together ( 2 consecutive lines from two files (2 lines from each file)).
I searched online and was able to get a code to read two lines together but only from one file.
with open(File1) as file1:
for line1,line2 in itertools.izip_longest(*[file1]*2):
Also, I was also able to read one line from each of the two files as:
for i,(line1,line2) in enumerate(itertools.izip(f1,f2)):
print line1, line2
But I want to do sth like:
Pseudocode:
for line1, line2 from file1 and line_1 and line_2 from file2:
compare line1 with line2
compare line1 with line_1
compare line2 with line_1
compare line2 with line_2
I am hoping a solution to be a linear time one. All the files have same number of lines and the first column (primary id) is same for the consecutive lines within a file and the other file follows the same order (See the above example).
Thanks.
How about this:
with open('a') as A, open('b') as B:
while True:
try:
lineA1, lineA2, lineB1, lineB2 = next(A), next(A), next(B), next(B)
# compare lines
# ...
except StopIteration:
break
Let's see how we can put these together. First:
with open(File1) as file1:
for line1,line2 in itertools.izip_longest(*[file1]*2):
Well, take out the for loop and you've got a 2-line-at-a-time iterator over file, right? So, you can do the same for file2. And then you can zip them together:
with open(File1) as file1, open(File2) as file2:
f1 = itertools.izip_longest(*[file1]*2)
f2 = itertools.izip_longest(*[file2]*2)
for i,((f1_line1, f1_line2), (f2_line1, f2_line2)) in enumerate(itertools.izip(f1,f2)):
# do stuff
But you really don't want to do this.
First, most people don't intuitively read izip_longest(*[file1]*2) and realize that it's grouping by pairs. Wrap that up as a function. In fact, don't even write the function yourself; take grouper right out of the itertools documentation.
So now, it's:
with open(File1) as file1, open(File2) as file2:
pairs1 = grouper(2, file1)
pairs2 = grouper(2, file2)
for i,((f1_line1, f1_line2), (f2_line1, f2_line2)) in enumerate(itertools.izip(f1,f2)):
# do stuff
Next, pattern-matching may be cool, but a nested pattern to decompose right in the middle of a complicated expression is a little too much. So, let's break it up, and un-nest things by borrowing flatten from the itertools docs again:
with open(File1) as file1, open(File2) as file2:
pairs1 = grouper(2, file1)
pairs2 = grouper(2, file2)
zipped_pairs = itertools.izip(pairs1, pairs2)
for i, zipped_pair in enumerate(zipped_pairs):
f1_line1, f1_line2, f2_line1, f2_line2 = flatten(zipped_pair)
# do stuff
The advantage of this solution is that it's abstract and generic, which means if you later decide you need groups of 5 lines, or 3 files, the change is obvious.
The disadvantage of this solution is that it's abstract and generic, which means it can't possibly be as simple as doing the concrete equivalent. (For example, if you didn't zip up a pair of groupers, you wouldn't have to flatten the result.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With