I need to filter strings by the criterion that they contain no character twice.
The using code would look like this:
>>> candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"]
>>> result_strings = [s if unique_chars(s) for s in candidate_strings]
>>> print(result_strings)
["barfnehg", "bazfnehg"]
I implemented a naive version, simply iterating the string:
def unique_chars_naive(string_given):
"""
Checks if a given string contains only unique characters.
This version iterates the given string, saving all occurred characters.
"""
chars_seen = []
for char in string_given:
if char in chars_seen:
return False
chars_seen.append(char)
return True
My next-best idea was to use a set
, so I implemented that:
def unique_chars_set(string_given):
"""
Checks if a given string contains only unique characters.
This version exploits that a set contains only unique entries.
"""
return len(string_given) == len(set(string_given))
Saving the functions to a file UniqueCharacters.py
, timed them:
$ python3 -m timeit -n 100000 --setup='import UniqueCharacters; candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"]' '[UniqueCharacters.unique_chars_naive(s) for s in candidate_strings]'
100000 loops, best of 3: 20.3 usec per loop
$ python3 -m timeit -n 100000 --setup='import UniqueCharacters; candidate_strings = ["foobnehg", "barfnehg", "bazfnehg"]' '[UniqueCharacters.unique_chars_set(s) for s in candidate_strings]'
100000 loops, best of 3: 17.7 usec per loop
This shows that the unique_chars_set
is faster by about 15 % for this dataset.
Is there a faster way to do this? With regular expressions maybe? Is there some method in the standard library that does this?
To find the duplicate character from the string, we count the occurrence of each character in the string. If count is greater than 1, it implies that a character has a duplicate entry in the string. In above example, the characters highlighted in green are duplicate characters.
Use the string. count() Function to Find All Occurrences of a Substring in a String in Python. The string. count() is an in-built function in Python that returns the quantity or number of occurrences of a substring in a given particular string.
Let me start off by saying that I suspect that you are optimizing when you don't need to. Python is a high-level language that supports thinking about computation in a high-level manner. A solution that is readable, elegant, and reusable is often going to be better than one that is blazingly fast, but hard to understand.
When, and only when, you determine that speed is an issue, then you should proceed with the optimizations. Perhaps even write a C extension for the computationally intense parts.
That being said, here's a comparison of a few techniques:
def unique_chars_set(s):
return len(s) == len(set(s))
def unique_chars_frozenset(s):
return len(s) == len(frozenset(s))
def unique_chars_counter(s):
return Counter(s).most_common(1)[0][1] > 1
def unique_chars_sort(s):
ss = ''.join(sorted(s))
prev = ''
for c in ss:
if c == prev:
return False
prev = c
return True
def unique_chars_bucket(s):
buckets = 255 * [False]
for c in s:
o = ord(c)
if buckets[o]:
return False
buckets[o] = True
return True
And here is the performance comparisons (in IPython):
In [0]: %timeit -r10 [unique_chars_set(s) for s in candidate_strings]
100000 loops, best of 10: 6.63 us per loop
In [1]: %timeit -r10 [unique_chars_frozenset(s) for s in candidate_strings]
100000 loops, best of 10: 6.81 us per loop
In [2]: %timeit -r10 [unique_chars_counter(s) for s in candidate_strings]
10000 loops, best of 10: 83.1 us per loop
In [3]: %timeit -r10 [unique_chars_sort(s) for s in candidate_strings]
100000 loops, best of 10: 13.1 us per loop
In [4]: %timeit -r10 [unique_chars_bucket(s) for s in candidate_strings]
100000 loops, best of 10: 15 us per loop
Conclusion: set
is elegant and faster than many other obvious methods. But the differences are so small, it doesn't matter anyway.
For more benchmarks, see @FrancisAvila's answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With