Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does this function to remove duplicate characters from a string in python work?

Tags:

python

I was looking up how to create a function that removes duplicate characters from a string in python and found this on stack overflow:

    from collections import OrderedDict

    def remove_duplicates (foo) :
        print " ".join(OrderedDict.fromkeys(foo))

It works, but how? I've searched what OrderedDict and fromkeys mean but I can't find anything that explains how it works in this context.

like image 385
Ben Cravens Avatar asked Mar 31 '15 04:03

Ben Cravens


1 Answers

I will give it a shot:

OrderedDict are dictionaries that store keys in order they are added. Normal dictionaries don't. If you look at doc of fromkeys, you find:

OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S.

So the fromkeys class method, creates an OrderedDict using items in the input iterable S (in my example characters from a string) as keys. In a dictionary, keys are unique, so duplicate items in S are ignored.

For example:

s = "abbcdece" # example string with duplicate characters

print(OrderedDict.fromkeys(s))

This results in an OrderedDict:

OrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None)])

Then " ".join(some_iterable) takes an iterable and joins its elements using a space in this case. It uses only keys, as iterating through a dictionary is done by its keys. For example:

for k in OrderedDict.fromkeys(s): # k is a key of the OrderedDict
    print(k)

Results in:

a
b
c
d
e

Subsequently, call to join:

print(" ".join(OrderedDict.fromkeys(s)))

will print out:

a b c d e

Using set

Sometimes, people use a set for this:

print( " ".join(set(s)))
# c a b d e

But unlike sets in C++, sets in python do not guarantee order. So using a set will give you unique values easily, but they might be in a different order then they are in the original list or string (as in the above example).

Hope this helps a bit.

like image 191
Marcin Avatar answered Oct 25 '22 04:10

Marcin