Build 2 lists in one go while reading from file, pythonically

Question

I'm reading a big file with hundreds of thousands of number pairs representing the edges of a graph. I want to build 2 lists as I go: one with the forward edges and one with the reversed.

Currently I'm doing an explicit for loop, because I need to do some pre-processing on the lines I read. However, I'm wondering if there is a more pythonic approach to building those lists, like list comprehensions, etc.

But, as I have 2 lists, I don't see a way to populate them using comprehensions without reading the file twice.

My code right now is:

with open('SCC.txt') as data:
    for line in data:
        line = line.rstrip()
        if line:
            edge_list.append((int(line.rstrip().split()[0]), int(line.rstrip().split()[1])))
            reversed_edge_list.append((int(line.rstrip().split()[1]), int(line.rstrip().split()[0])))

Padraic Cunningham · Accepted Answer

I would keep your logic as it is the Pythonic approach just not split/rstrip the same line multiple times:

with open('SCC.txt') as data:
    for line in data:
        spl = line.split()
        if spl:
            i, j = map(int, spl)
            edge_list.append((i, j))
            reversed_edge_list.append((j, i))

Calling rstrip when you have already called it is redundant in itself even more so when you are splitting as that would already remove the whitespace so splitting just once means you save doing a lot of unnecessary work.

You can also use csv.reader to read the data and filter empty rows once you have a single whitespace delimiting:

from csv import reader

with open('SCC.txt') as data:
    edge_list, reversed_edge_list = [], []
    for i, j in filter(None, reader(data, delimiter=" ")):
        i, j = int(i), int(j)
        edge_list.append((i, j))
        reversed_edge_list.append((j, i))

Or if there are multiple whitespaces delimiting you can use map(str.split, data):

    for i, j in filter(None, map(str.split, data)):
        i, j = int(i), int(j)

Whatever you choose will be faster than going over the data twice or splitting the sames lines multiple times.

Dimitris Fasarakis Hilliard · Answer

You can't create two lists in one comprehension, so, instead of doing the same operations twice on the two lists, one viable option would be to initialize one of them and then create the second one by reversing each entry in the first one. That way you don't iterate over the file twice.

To that end, you could create the first list edge_list with a comprehension (not sure why you called rsplit again on it):

edge_list = [tuple(map(int, line.split())) for line in data]

And now go through each entry and reverse it with [::-1] in order to create its reversed sibling reverse_edge_list.

Using mock data for edge_list:

edge_list = [(1, 2), (3, 4), (5, 6)]

Reversing it could look like this:

reverse_edge_list = [t[::-1] for t in edge_list]

Which now looks like:

reverse_edge_list
[(2, 1), (4, 3), (6, 5)]

khael · Answer

Maybe not clearer, but shorter:

with open('SCC.txt') as data:
    process_line = lambda line, r: (int(line.rstrip().split()[r]), int(line.rstrip().split()[1-r]))

    edge_list, reverved_edge_list = map(list, zip(*[(process_line(line, 0), process_line(line, 1)) 
                                                    for line in data
                                                    if line.rstrip()]))

Netwave · Answer

Here comes a solution

A test file:

In[19]: f = ["{} {}".format(i,j) for i,j in zip(xrange(10), xrange(10, 20))]
In[20]: f
Out[20]: 
['0 10',
 '1 11',
 '2 12',
 '3 13',
 '4 14',
 '5 15',
 '6 16',
 '7 17',
 '8 18',
 '9 19']

One liner using comprehension, zip and map:

In[27]: l, l2 = map(list,zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]))
In[28]: l
Out[28]: 
[(0, 10),
 (1, 11),
 (2, 12),
 (3, 13),
 (4, 14),
 (5, 15),
 (6, 16),
 (7, 17),
 (8, 18),
 (9, 19)]
In[29]: l2
Out[29]: 
[(10, 0),
 (11, 1),
 (12, 2),
 (13, 3),
 (14, 4),
 (15, 5),
 (16, 6),
 (17, 7),
 (18, 8),
 (19, 9)]

Explaining, with [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f] we build a list containing a pair tuple with the pair tuples and its reversed forms:

In[24]: [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
Out[24]: 
[((0, 10), (10, 0)),
 ((1, 11), (11, 1)),
 ((2, 12), (12, 2)),
 ((3, 13), (13, 3)),
 ((4, 14), (14, 4)),
 ((5, 15), (15, 5)),
 ((6, 16), (16, 6)),
 ((7, 17), (17, 7)),
 ((8, 18), (18, 8)),
 ((9, 19), (19, 9))]

Applaying zip to the unpack form we split the tuples inside the main tuple, so we have 2 tuples containing the tuples pairs in the first and the reversed in the others:

In[25]: zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f])
Out[25]: 
[((0, 10),
  (1, 11),
  (2, 12),
  (3, 13),
  (4, 14),
  (5, 15),
  (6, 16),
  (7, 17),
  (8, 18),
  (9, 19)),
 ((10, 0),
  (11, 1),
  (12, 2),
  (13, 3),
  (14, 4),
  (15, 5),
  (16, 6),
  (17, 7),
  (18, 8),
  (19, 9))]

Almost there, we just use map to transform that tuples into lists.

EDIT: as @PadraicCunningham asked, for filtering empty lines, just add a if x in the comprehension [ ... for x in f if x]

Build 2 lists in one go while reading from file, pythonically

Tags:

python

list

python-3.x

Nick Slavsky

4 Answers

Padraic Cunningham

Dimitris Fasarakis Hilliard

khael

Netwave

Recent Activity

Donate For Us

Build 2 lists in one go while reading from file, pythonically

Tags:

python

list

python-3.x

Nick Slavsky

4 Answers

Padraic Cunningham

Dimitris Fasarakis Hilliard

khael

Netwave

Related questions

Recent Activity

Donate For Us