Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I remove duplicate characters and keep the unique one only in Perl?

How do I remove duplicate characters and keep the unique one only. For example, my input is:

EFUAHUU
UUUEUUUUH
UJUJHHACDEFUCU

Expected output is:

EFUAH
UEH
UJHACDEF

I came across perl -pe's/$1//g while/(.).*\/' which is wonderful but it is removing even the single occurrence of the character in output.

like image 419
manu Avatar asked Apr 06 '10 06:04

manu


People also ask

Does Set function remove duplicates?

Insert all array element in the Set. Set does not allow duplicates and sets like LinkedHashSet maintains the order of insertion so it will remove duplicates and elements will be printed in the same order in which it is inserted.


1 Answers

if Perl is not a must, you can also use awk. here's a fun benchmark on the Perl one liners posted against awk. awk is 10+ seconds faster for a file with 3million++ lines

$ wc -l <file2
3210220

$ time awk 'BEGIN{FS=""}{delete _;for(i=1;i<=NF;i++){if(!_[$i]++) printf $i};print""}' file2 >/dev/null

real    1m1.761s
user    0m58.565s
sys     0m1.568s

$ time perl -n -e '%seen=();' -e 'for (split //) {print unless $seen{$_}++;}'  file2 > /dev/null

real    1m32.123s
user    1m23.623s
sys     0m3.450s

$ time perl -ne '$_=reverse;s/(.)(?=.*?\1)//g;print scalar reverse;' file2 >/dev/null

real    1m17.818s
user    1m10.611s
sys     0m2.557s

$ time perl -ne'my%s;print grep!$s{$_}++,split//' file2 >/dev/null

real    1m20.347s
user    1m13.069s
sys     0m2.896s
like image 171
ghostdog74 Avatar answered Nov 03 '22 19:11

ghostdog74