I have a text file with some 1,200 rows. Some of them are duplicates.
How could I find the duplicate lines in the file (but not worrying about case) and then print out the line's text on the screen, so I can go off and find it? I don't want to delete them or anything, just find which lines they might be.
This is pretty easy with a set:
with open('file') as f:
seen = set()
for line in f:
line_lower = line.lower()
if line_lower in seen:
print(line)
else:
seen.add(line_lower)
as there are only 1200 lines, so you can also use collections.Counter()
:
>>> from collections import Counter
>>> with open('data1.txt') as f:
... c=Counter(c.strip().lower() for c in f if c.strip()) #for case-insensitive search
... for line in c:
... if c[line]>1:
... print line
...
if data1.txt
is something like this:
ABC
abc
aBc
CAB
caB
bca
BcA
acb
output is:
cab
abc
bca
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With