I am trying to figure out the lightest method for determining whether a string has any repeated characters, in the lightest way possible. I have tried searching for similar questions, but cant find any. It also needs to be the shorted way possible, as i will be checking quite a few strings (I can handle putting this into a loop, etc.)
For example:
a = "12348546478"
#code to check multiple characters
print(result)
Results: 8 was repeated, 4 was repeated
The code will check what character was repeated and print out what was repeated. I don't need to know how many times it was repeated, just whether it was or was not repeated.
Or alternatively you could do
len(set(x)) == len(x)
This returns a boolean, True
if the string has no repeating characters, False
otherwise.
The set
type can't have any duplicates so when the string gets turned into one, it gets broken down into characters. The difference in length shows how many repeated characters there were (But NOT the characters themselves)
You can use collections.Counter
:
>>> from collections import Counter
>>> [i for i,j in Counter(a).items() if j>1]
['4', '8']
Or you can use a custom function :
>>> def finder(s):
... seen,yields=set(),set()
... for i in s:
... if i in seen:
... if i not in yields:
... yield i
... yields.add(i)
... else :
... yields.add(i)
... else:
... seen.add(i)
...
>>> list(finder(a))
['4', '8']
Or use str.count
method in a set comprehension :
>>> set(i for i in a if a.count(i)>1)
set(['8', '4'])
A benchmark on all approaches, which shows that the last 2 way (custom function and set comprehensions are much faster than Counter
):
from timeit import timeit
s1="""
a = "12348546478"
[i for i,j in Counter(a).items() if j>1]
"""
s2="""
def finder(s):
seen,yields=set(),set()
for i in s:
if i in seen:
if i not in yields:
yield i
yields.add(i)
else :
yields.add(i)
else:
seen.add(i)
a = "12348546478"
list(finder(a))
"""
s3="""
a = "12348546478"
set(i for i in a if a.count(i)>1)
"""
print '1st: ' ,timeit(stmt=s1, number=100000,setup="from collections import Counter")
print '2nd : ',timeit(stmt=s2, number=100000)
print '3rd : ',timeit(stmt=s2, number=100000)
result :
1st: 0.726881027222
2nd : 0.265578985214
3rd : 0.26243185997
I also tried this for long string (a = "12348546478"*10000
) and still got the same result:
1st: 25.5780302721341
2nd : 11.8482989001177
3rd : 11.926538944245
Any way my suggestion is using the set comprehension which is more pythonic :
set(i for i in a if a.count(i)>1)
you can also use dictionary to get the count of unique characters as the key in a dictionary is always unique.
import collections
d = collections.defaultdict(int)
for c in a:
d[c] += 1
d will contain {'1': 1, '3': 1, '2': 1, '5': 1, '4': 3, '7': 1, '6': 1, '8': 2}
And the answer given by Kasramvd is a nice approach.
You can use the function below to check a character repetition. It returns True if there is no repetition of character and returns False otherwise.
Python Code
def isThereRepitition(x):
for char in x: #copies and iterates passing a value to char everytime
x=x[1:] #deletes the first character in the string x
if char in x: #checks if there is char in x string
return False
return True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With