I am using the following code to implement a function which finds all anagrams of string p in a string s.
class Solution(object):
def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
ans = list()
pcnt = collections.Counter(p)
for i in range(len(s)):
if collections.Counter(s[i:i+len(p)]) == pcnt:
ans.append(i)
return ans
when running on large length input string s, it gives me "time exceeds limit" error in the online code testing system. However the following code will work with no such issue:
class Solution(object):
def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
ls, lp = len(s), len(p)
cp = collections.Counter(p)
cs = collections.Counter()
ans = []
for i in range(ls):
cs[s[i]] += 1
if i >= lp:
cs[s[i - lp]] -= 1
if cs[s[i - lp]] == 0:
del cs[s[i - lp]]
if cs == cp:
ans.append(i - lp + 1)
return ans
Can I know why? It seems both solution uses two Counters of maximum size of len(p)?
To see why some code runs faster than other code, you should profile it. In Python, the easiest way to get started with profiling is to run:
python -m cProfile <script.py>
In my case, I wrote a simple script that calls either the slow solution or the fast solution:
# Pasted code from original question.
# Also renamed the slow version to `SlowSolution` and the fast version to `FastSolution`.
...
# solution = FastSolution()
solution = SlowSolution()
print(solution.findAnagrams('abcdefg' + 'a' * 10000, 'gfedcba' + 'a' * 10000))
Then I just ran the script using SlowSolution
and FastSolution
. Here's the output of my profiler results using SlowSolution
:
$ python -m cProfile counter.py
[0]
100204 function calls (100192 primitive calls) in 2.557 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10008 0.015 0.000 2.538 0.000 __init__.py:516(__init__)
10008 0.009 0.000 2.522 0.000 __init__.py:585(update)
7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals)
9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__)
20022 0.007 0.000 0.007 0.000 _weakrefset.py:70(__contains__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add)
10008 0.010 0.000 0.017 0.000 abc.py:178(__instancecheck__)
7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__)
1 0.000 0.000 2.557 2.557 counter.py:1(<module>)
1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution)
1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution)
1 0.017 0.017 2.556 2.556 counter.py:4(findAnagrams)
10008 2.490 0.000 2.490 0.000 {built-in method _collections._count_elements}
2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__}
1 0.000 0.000 2.557 2.557 {built-in method builtins.exec}
7 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
10008 0.005 0.000 0.022 0.000 {built-in method builtins.isinstance}
8/2 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
30024 0.003 0.000 0.003 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
7 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects}
14 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
7 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}
and FastSolution
:
$ python -m cProfile counter.py
[0]
146 function calls (134 primitive calls) in 0.005 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.000 0.000 0.001 0.000 __init__.py:516(__init__)
7 0.000 0.000 0.000 0.000 __init__.py:536(__missing__)
2 0.000 0.000 0.001 0.000 __init__.py:585(update)
7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals)
9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__)
8 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add)
1 0.000 0.000 0.000 0.000 abc.py:178(__instancecheck__)
7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__)
1 0.000 0.000 0.005 0.005 counter.py:1(<module>)
1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution)
1 0.004 0.004 0.005 0.005 counter.py:18(findAnagrams)
1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution)
1 0.001 0.001 0.001 0.001 {built-in method _collections._count_elements}
2 0.000 0.000 0.000 0.000 {built-in method builtins.__build_class__}
1 0.000 0.000 0.005 0.005 {built-in method builtins.exec}
7 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
1 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
8/2 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
6 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
7 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects}
14 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
7 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects}
The output can be a little strange to read at first, but we're really interested in the tottime
column. That tells us how much total time we spent inside of a particular function.
As you can see, the script spends almost all of its time inside of {built-in method _collections._count_elements}
. That's an internal method used by a Counter
which we can infer gets called every time you create a counter (like collections.Counter(p)
).
To make the code faster, you should call collections.Counter(...)
fewer times and/or with shorter strings. In the slow version, you're counting len(p)
characters len(s)
times. This has a runtime of O(sp)
which is quadratic and explainswhy it's so slow on large inputs.
On the other hand, the faster solution counts each character of s
exactly once which gives it a runtime of O(s + p)
. This is much faster and will scale with much larger inputs.
For more info on profiling in Python, see How can you profile a python script?
There is more overhead in creating, counting, and comparing Counter
objects than for lists. That is the essence of what you are seeing. If you want a faster method still, you can accomplish the anagram finder by building the permutations of p
as a tuple, then checking the slices of s
against the tuple.
class Solution(object):
def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
ans = list()
pcnt = collections.Counter(p)
for i in range(len(s)):
if collections.Counter(s[i:i+len(p)]) == pcnt:
ans.append(i)
return ans
def findAnagrams2(self, s, p):
ls, lp = len(s), len(p)
cp = collections.Counter(p)
cs = collections.Counter()
ans = []
for i in range(ls):
cs[s[i]] += 1
if i >= lp:
cs[s[i - lp]] -= 1
if cs[s[i - lp]] == 0:
del cs[s[i - lp]]
if cs == cp:
ans.append(i - lp + 1)
return ans
def findAnagrams3(self, s, p):
p_all = tuple(''.join(x) for x in permutations(p,len(p)))
return [i for i in range(len(s)) if s[i:i+len(p)] in p_all]
Here is a short comparison of the 3 methods using timeit
in IPython:
In [33]: %%timeit
...: sol.findAnagrams('hello world he said eh', 'he')
...:
1000 loops, best of 3: 259 µs per loop
In [34]: %%timeit
...: sol.findAnagrams2('hello world he said eh', 'he')
...:
10000 loops, best of 3: 102 µs per loop
In [35]: %%timeit
...: sol.findAnagrams3('hello world he said eh', 'he')
...:
100000 loops, best of 3: 15.5 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With