I'm not much familiar with Python. But, I want to remove duplicates from lines of a string.
Ex:
str = "aaa
aaa
aaa
abb
abb
ccc"
List is a sorted ordered list.
str = "aaa
abb
ccc"
I've millions of such lines. I know the long way of removing duplicates, but would like to know if any possible short form.
str as a variable name, since it's a builtin type'''...''' to wrap multi-line stringssorted, set, split in your case, e.g.:
In [895]: print '\n'.join(sorted(set(ss.split())))
aaa
abb
ccc
thank @user2357112 for mentioning, if you want to preserve the order the words apear, use OrderedDict:
In [910]: ss = '''zzz #<----------
...: aaa
...: aaa
...: aaa
...: abb
...: abb
...: ccc'''
In [911]: from collections import OrderedDict
...: print '\n'.join(OrderedDict.fromkeys(ss.split()))
zzz #here zzz ranks the first
aaa
abb
ccc
If the list is sorted, you don't need a set, because all the duplicates will be grouped together. Just track the last element
prevLine = NIL
for line in lines
if line != prevLine:
# output line
prevLine = line
(My python is rusty, don't trust the syntax here. I'll check it)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With