Remove duplicate lines from a string

Question

I'm not much familiar with Python. But, I want to remove duplicates from lines of a string.

Ex:

str = "aaa
       aaa
       aaa
       abb
       abb
       ccc"

List is a sorted ordered list.

str = "aaa
       abb
       ccc"

I've millions of such lines. I know the long way of removing duplicates, but would like to know if any possible short form.

zhangxaochen · Accepted Answer

Don't use str as a variable name, since it's a builtin type
use '''...''' to wrap multi-line strings
use sorted, set, split in your case,

e.g.:

In [895]: print '
'.join(sorted(set(ss.split())))
aaa
abb
ccc

thank @user2357112 for mentioning, if you want to preserve the order the words apear, use OrderedDict:

In [910]: ss = '''zzz #<----------
     ...:        aaa
     ...:        aaa
     ...:        aaa
     ...:        abb
     ...:        abb
     ...:        ccc'''

In [911]: from collections import OrderedDict
     ...: print '
'.join(OrderedDict.fromkeys(ss.split()))
zzz #here zzz ranks the first
aaa
abb
ccc

torquestomp · Answer

If the list is sorted, you don't need a set, because all the duplicates will be grouped together. Just track the last element

prevLine = NIL
for line in lines
  if line != prevLine:
    # output line
  prevLine = line

(My python is rusty, don't trust the syntax here. I'll check it)

Remove duplicate lines from a string

Tags:

python

user1919035

2 Answers

zhangxaochen

torquestomp

Recent Activity

Donate For Us

Remove duplicate lines from a string

Tags:

python

user1919035

2 Answers

zhangxaochen

torquestomp

Related questions

Recent Activity

Donate For Us