Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group lines separated by a blank line more pythonically

I have a file that contains a list of duplicate, but uniquely named files.

For example:

<md5sum>  /var/www/one.png
<md5sum>  /var/www/one-1.png

<md5sum>  /var/www/two.png
<md5sum>  /var/www/two-1.png
<md5sum>  /var/www/two-2.png

The goal is to end up with the following:

[
    [
        '/var/www/one.png',
        '/var/www/one-1.png'
    ],
    [
        '/var/www/two.png',
        '/var/www/two-1.png',
        '/var/www/two-2.png'
    ]
]

This is output from a command I ran earlier. Now I need to process this output, and I came up with the following code for starters:

from pprint import pprint
DUPES_FILE = './dupes.txt'

def process_dupes(dupes_file):
    groups = [[]]
    index = 0
    for line in dupes_file:
        if line != '\n':
            path = line.split('  ')[1]
            groups[index].append(path)
        else:
            index += 1
            groups.append([])

    pprint(groups)

with open(DUPES_FILE, 'r') as dupes_file:
    process_dupes(dupes_file)

Is there a more concise way to write this?

like image 233
Blaine Lafreniere Avatar asked Dec 19 '22 04:12

Blaine Lafreniere


1 Answers

Read the entire file into a variable. Use split("\n\n") to separate it into the duplicate groups, then split that with split("\n") to get each line, and finally split each line with split(" ").

def process_dupes(dupes_file)
    contents = dupes_file.read()
    groups = [[line.split("  ")[1] for line in group.split("\n") if line != ""] for group in contents.split("\n\n")]
like image 67
Barmar Avatar answered Dec 20 '22 20:12

Barmar