I was looking up how to create a function that removes duplicate characters from a string in python and found this on stack overflow:
from collections import OrderedDict
def remove_duplicates (foo) :
print " ".join(OrderedDict.fromkeys(foo))
It works, but how? I've searched what OrderedDict and fromkeys mean but I can't find anything that explains how it works in this context.
I will give it a shot:
OrderedDict are dictionaries that store keys in order they are added. Normal dictionaries don't. If you look at doc of fromkeys
, you find:
OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S.
So the fromkeys
class method, creates an OrderedDict
using items in the input iterable S (in my example characters from a string) as keys. In a dictionary, keys are unique, so duplicate items in S
are ignored.
For example:
s = "abbcdece" # example string with duplicate characters
print(OrderedDict.fromkeys(s))
This results in an OrderedDict:
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None)])
Then " ".join(some_iterable)
takes an iterable and joins its elements using a space in this case. It uses only keys, as iterating through a dictionary is done by its keys. For example:
for k in OrderedDict.fromkeys(s): # k is a key of the OrderedDict
print(k)
Results in:
a
b
c
d
e
Subsequently, call to join:
print(" ".join(OrderedDict.fromkeys(s)))
will print out:
a b c d e
Using set
Sometimes, people use a set for this:
print( " ".join(set(s)))
# c a b d e
But unlike sets in C++, sets in python do not guarantee order. So using a set will give you unique values easily, but they might be in a different order then they are in the original list or string (as in the above example).
Hope this helps a bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With