compare file line by line python

Question

What is the most elegant way to go through a sorted list by it's first index? Input:

Meni22   xxxx xxxx
Meni32_2 xxxx xxxx
Meni32_2 xxxx xxxx
Meni45_1 xxxx xxxx
Meni45_1 xxxx xxxx
Meni45   xxxx xxxx

Is it to go trough line by line:

list1 = []
list2 = []
for line in input:
    if line[0] not in list1:
    list.append(line)
else:
    list2.append(line)

Example won't obviously work. It adds the first match of the line[0] and continues. I would rather have it go through the list, add to list1 lines that it finds only once and rest to list2.

After script:

List1:

Meni22   xxxx xxxx
Meni45   xxxx xxxx

List2: 

Meni45_1 xxxx xxxx
Meni45_1 xxxx xxxx
Meni32_2 xxxx xxxx
Meni32_2 xxxx xxxx

John La Rooy · Accepted Answer

Since the file is sorted, you can use groupby

from itertools import groupby
list1, list2 = res = [], []
with open('file1.txt', 'rb') as fin:
    for k,g in groupby(fin, key=lambda x:x.partition(' ')[0]):
        g = list(g)
        res[len(g) > 1] += g

Or if you prefer this longer version

from itertools import groupby
list1, list2 = [], []
with open('file1.txt', 'rb') as fin:
    for k,g in groupby(fin, key=lambda x:x.partition(' ')[0]):
        g = list(g)
        if len(g) > 1:
            list2 += g
        else:
            list1 += g

Ashwini Chaudhary · Answer

You can use collections.Counter:

from collections import Counter
lis1 = []
lis2 = []
with open("abc") as f:
    c = Counter(line.split()[0] for line in f)

for key,val in c.items():
    if val == 1:
        lis1.append(key)
    else:
        lis2.extend([key]*val)
print lis1
print lis2

output:

['Meni45', 'Meni22']
['Meni32_2', 'Meni32_2', 'Meni45_1', 'Meni45_1']

Edit:

from collections import defaultdict
lis1 = []
lis2 = []

with open("abc") as f:
    dic = defaultdict(list)
    for line in f:
        spl =line.split()
        dic[spl[0]].append(spl[1:])

for key,val in dic.items():
    if len(val) == 1:
        lis1.append(key)
    else:
        lis2.append(key)
print lis1
print lis2

print dic["Meni32_2"]  #access columns related to any key from the the dict

output:

['Meni45', 'Meni22']
['Meni32_2', 'Meni45_1']
[['xxxx', 'xxxx'], ['xxxx', 'xxxx']]

compare file line by line python

Tags:

python

list

compare

jester112358

2 Answers

John La Rooy

Ashwini Chaudhary

Recent Activity

Donate For Us

compare file line by line python

Tags:

python

list

compare

jester112358

2 Answers

John La Rooy

Ashwini Chaudhary

Related questions

Recent Activity

Donate For Us