I'm reading a big file with hundreds of thousands of number pairs representing the edges of a graph. I want to build 2 lists as I go: one with the forward edges and one with the reversed.
Currently I'm doing an explicit for
loop, because I need to do some pre-processing on the lines I read. However, I'm wondering if there is a more pythonic approach to building those lists, like list comprehensions, etc.
But, as I have 2 lists, I don't see a way to populate them using comprehensions without reading the file twice.
My code right now is:
with open('SCC.txt') as data:
for line in data:
line = line.rstrip()
if line:
edge_list.append((int(line.rstrip().split()[0]), int(line.rstrip().split()[1])))
reversed_edge_list.append((int(line.rstrip().split()[1]), int(line.rstrip().split()[0])))
I would keep your logic as it is the Pythonic approach just not split/rstrip the same line multiple times:
with open('SCC.txt') as data:
for line in data:
spl = line.split()
if spl:
i, j = map(int, spl)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Calling rstrip when you have already called it is redundant in itself even more so when you are splitting as that would already remove the whitespace so splitting just once means you save doing a lot of unnecessary work.
You can also use csv.reader to read the data and filter empty rows once you have a single whitespace delimiting:
from csv import reader
with open('SCC.txt') as data:
edge_list, reversed_edge_list = [], []
for i, j in filter(None, reader(data, delimiter=" ")):
i, j = int(i), int(j)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Or if there are multiple whitespaces delimiting you can use map(str.split, data)
:
for i, j in filter(None, map(str.split, data)):
i, j = int(i), int(j)
Whatever you choose will be faster than going over the data twice or splitting the sames lines multiple times.
You can't create two lists in one comprehension, so, instead of doing the same operations twice on the two lists, one viable option would be to initialize one of them and then create the second one by reversing each entry in the first one. That way you don't iterate over the file twice.
To that end, you could create the first list edge_list
with a comprehension (not sure why you called rsplit
again on it):
edge_list = [tuple(map(int, line.split())) for line in data]
And now go through each entry and reverse it with [::-1]
in order to create its reversed sibling reverse_edge_list
.
Using mock data for edge_list
:
edge_list = [(1, 2), (3, 4), (5, 6)]
Reversing it could look like this:
reverse_edge_list = [t[::-1] for t in edge_list]
Which now looks like:
reverse_edge_list
[(2, 1), (4, 3), (6, 5)]
Maybe not clearer, but shorter:
with open('SCC.txt') as data:
process_line = lambda line, r: (int(line.rstrip().split()[r]), int(line.rstrip().split()[1-r]))
edge_list, reverved_edge_list = map(list, zip(*[(process_line(line, 0), process_line(line, 1))
for line in data
if line.rstrip()]))
Here comes a solution
A test file:
In[19]: f = ["{} {}".format(i,j) for i,j in zip(xrange(10), xrange(10, 20))]
In[20]: f
Out[20]:
['0 10',
'1 11',
'2 12',
'3 13',
'4 14',
'5 15',
'6 16',
'7 17',
'8 18',
'9 19']
One liner using comprehension, zip and map:
In[27]: l, l2 = map(list,zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]))
In[28]: l
Out[28]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In[29]: l2
Out[29]:
[(10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9)]
Explaining, with [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
we build a list containing a pair tuple with the pair tuples and its reversed forms:
In[24]: [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
Out[24]:
[((0, 10), (10, 0)),
((1, 11), (11, 1)),
((2, 12), (12, 2)),
((3, 13), (13, 3)),
((4, 14), (14, 4)),
((5, 15), (15, 5)),
((6, 16), (16, 6)),
((7, 17), (17, 7)),
((8, 18), (18, 8)),
((9, 19), (19, 9))]
Applaying zip
to the unpack form we split the tuples inside the main tuple, so we have 2 tuples containing the tuples pairs in the first and the reversed in the others:
In[25]: zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f])
Out[25]:
[((0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)),
((10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9))]
Almost there, we just use map
to transform that tuples into lists.
EDIT:
as @PadraicCunningham asked, for filtering empty lines, just add a if x
in the comprehension [ ... for x in f if x]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With