Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build 2 lists in one go while reading from file, pythonically

I'm reading a big file with hundreds of thousands of number pairs representing the edges of a graph. I want to build 2 lists as I go: one with the forward edges and one with the reversed.

Currently I'm doing an explicit for loop, because I need to do some pre-processing on the lines I read. However, I'm wondering if there is a more pythonic approach to building those lists, like list comprehensions, etc.

But, as I have 2 lists, I don't see a way to populate them using comprehensions without reading the file twice.

My code right now is:

with open('SCC.txt') as data:
    for line in data:
        line = line.rstrip()
        if line:
            edge_list.append((int(line.rstrip().split()[0]), int(line.rstrip().split()[1])))
            reversed_edge_list.append((int(line.rstrip().split()[1]), int(line.rstrip().split()[0])))
like image 814
Nick Slavsky Avatar asked Sep 01 '16 10:09

Nick Slavsky


4 Answers

I would keep your logic as it is the Pythonic approach just not split/rstrip the same line multiple times:

with open('SCC.txt') as data:
    for line in data:
        spl = line.split()
        if spl:
            i, j = map(int, spl)
            edge_list.append((i, j))
            reversed_edge_list.append((j, i))

Calling rstrip when you have already called it is redundant in itself even more so when you are splitting as that would already remove the whitespace so splitting just once means you save doing a lot of unnecessary work.

You can also use csv.reader to read the data and filter empty rows once you have a single whitespace delimiting:

from csv import reader

with open('SCC.txt') as data:
    edge_list, reversed_edge_list = [], []
    for i, j in filter(None, reader(data, delimiter=" ")):
        i, j = int(i), int(j)
        edge_list.append((i, j))
        reversed_edge_list.append((j, i))

Or if there are multiple whitespaces delimiting you can use map(str.split, data):

    for i, j in filter(None, map(str.split, data)):
        i, j = int(i), int(j)

Whatever you choose will be faster than going over the data twice or splitting the sames lines multiple times.

like image 194
Padraic Cunningham Avatar answered Oct 02 '22 19:10

Padraic Cunningham


You can't create two lists in one comprehension, so, instead of doing the same operations twice on the two lists, one viable option would be to initialize one of them and then create the second one by reversing each entry in the first one. That way you don't iterate over the file twice.

To that end, you could create the first list edge_list with a comprehension (not sure why you called rsplit again on it):

edge_list = [tuple(map(int, line.split())) for line in data]

And now go through each entry and reverse it with [::-1] in order to create its reversed sibling reverse_edge_list.

Using mock data for edge_list:

edge_list = [(1, 2), (3, 4), (5, 6)]

Reversing it could look like this:

reverse_edge_list = [t[::-1] for t in edge_list]

Which now looks like:

reverse_edge_list
[(2, 1), (4, 3), (6, 5)]
like image 32
Dimitris Fasarakis Hilliard Avatar answered Oct 02 '22 19:10

Dimitris Fasarakis Hilliard


Maybe not clearer, but shorter:

with open('SCC.txt') as data:
    process_line = lambda line, r: (int(line.rstrip().split()[r]), int(line.rstrip().split()[1-r]))

    edge_list, reverved_edge_list = map(list, zip(*[(process_line(line, 0), process_line(line, 1)) 
                                                    for line in data
                                                    if line.rstrip()]))
like image 45
khael Avatar answered Oct 02 '22 17:10

khael


Here comes a solution

A test file:

In[19]: f = ["{} {}".format(i,j) for i,j in zip(xrange(10), xrange(10, 20))]
In[20]: f
Out[20]: 
['0 10',
 '1 11',
 '2 12',
 '3 13',
 '4 14',
 '5 15',
 '6 16',
 '7 17',
 '8 18',
 '9 19']

One liner using comprehension, zip and map:

In[27]: l, l2 = map(list,zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]))
In[28]: l
Out[28]: 
[(0, 10),
 (1, 11),
 (2, 12),
 (3, 13),
 (4, 14),
 (5, 15),
 (6, 16),
 (7, 17),
 (8, 18),
 (9, 19)]
In[29]: l2
Out[29]: 
[(10, 0),
 (11, 1),
 (12, 2),
 (13, 3),
 (14, 4),
 (15, 5),
 (16, 6),
 (17, 7),
 (18, 8),
 (19, 9)]

Explaining, with [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f] we build a list containing a pair tuple with the pair tuples and its reversed forms:

In[24]: [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
Out[24]: 
[((0, 10), (10, 0)),
 ((1, 11), (11, 1)),
 ((2, 12), (12, 2)),
 ((3, 13), (13, 3)),
 ((4, 14), (14, 4)),
 ((5, 15), (15, 5)),
 ((6, 16), (16, 6)),
 ((7, 17), (17, 7)),
 ((8, 18), (18, 8)),
 ((9, 19), (19, 9))]

Applaying zip to the unpack form we split the tuples inside the main tuple, so we have 2 tuples containing the tuples pairs in the first and the reversed in the others:

In[25]: zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f])
Out[25]: 
[((0, 10),
  (1, 11),
  (2, 12),
  (3, 13),
  (4, 14),
  (5, 15),
  (6, 16),
  (7, 17),
  (8, 18),
  (9, 19)),
 ((10, 0),
  (11, 1),
  (12, 2),
  (13, 3),
  (14, 4),
  (15, 5),
  (16, 6),
  (17, 7),
  (18, 8),
  (19, 9))]

Almost there, we just use map to transform that tuples into lists.

EDIT: as @PadraicCunningham asked, for filtering empty lines, just add a if x in the comprehension [ ... for x in f if x]

like image 31
Netwave Avatar answered Oct 02 '22 17:10

Netwave