I have a file that contains a list of duplicate, but uniquely named files.
For example:
<md5sum> /var/www/one.png
<md5sum> /var/www/one-1.png
<md5sum> /var/www/two.png
<md5sum> /var/www/two-1.png
<md5sum> /var/www/two-2.png
The goal is to end up with the following:
[
[
'/var/www/one.png',
'/var/www/one-1.png'
],
[
'/var/www/two.png',
'/var/www/two-1.png',
'/var/www/two-2.png'
]
]
This is output from a command I ran earlier. Now I need to process this output, and I came up with the following code for starters:
from pprint import pprint
DUPES_FILE = './dupes.txt'
def process_dupes(dupes_file):
groups = [[]]
index = 0
for line in dupes_file:
if line != '\n':
path = line.split(' ')[1]
groups[index].append(path)
else:
index += 1
groups.append([])
pprint(groups)
with open(DUPES_FILE, 'r') as dupes_file:
process_dupes(dupes_file)
Is there a more concise way to write this?
Read the entire file into a variable. Use split("\n\n")
to separate it into the duplicate groups, then split that with split("\n")
to get each line, and finally split each line with split(" ")
.
def process_dupes(dupes_file)
contents = dupes_file.read()
groups = [[line.split(" ")[1] for line in group.split("\n") if line != ""] for group in contents.split("\n\n")]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With