What is the big O of the following <code>if statement</code>? <pre class="prettyprint"><code>if "pl" in "apple": ... </code></pre> What is the overall big O of how python determines if the string "pl" is found in the string "apple" or any other substring in string search. Is this the most efficient way to test if a substring is in a string? Does it use the same algorithm as <code>.find()</code>?

The time complexity is O(N) on average, O(NM) worst case (N being the length of the longer string, M, the shorter string you search for). As of Python 3.10, heuristics are used to lower the worst-case scenario to O(N + M) by switching algorithms. The same algorithm is used for <code>str.index()</code>, <code>str.find()</code>, <code>str.__contains__()</code> (the <code>in</code> operator) and <code>str.replace()</code>; it is a simplification of the Boyer-Moore with ideas taken from the Boyer–Moore–Horspool and Sunday algorithms. See the original stringlib discussion post, as well as the <code>fastsearch.h</code> source code; until Python 3.10, the base algorithm has not changed since introduction in Python 2.5 (apart from some low-level optimisations and corner-case fixes). The post includes a Python-code outline of the algorithm: <blockquote> <pre class="prettyprint"><code>def find(s, p): # find first occurrence of p in s n = len(s) m = len(p) skip = delta1(p)[p[m-1]] i = 0 while i <= n-m: if s[i+m-1] == p[m-1]: # (boyer-moore) # potential match if s[i:i+m-1] == p[:m-1]: return i if s[i+m] not in p: i = i + m + 1 # (sunday) else: i = i + skip # (horspool) else: # skip if s[i+m] not in p: i = i + m + 1 # (sunday) else: i = i + 1 return -1 # not found </code></pre> </blockquote> as well as speed comparisons. In Python 3.10, the algorithm was updated to use an enhanced version of the Crochemore and Perrin's Two-Way string searching algorithm for larger problems (with <code>p</code> and <code>s</code> longer than 100 and 2100 characters, respectively, with <code>s</code> at least 6 times as long as <code>p</code>), in response to a pathological edgecase someone reported. The commit adding this change included a write-up on how the algorithm works. The Two-way algorithm has a worst-case time complexity of O(N + M), where O(M) is a cost paid up-front to build a shift table from the <code>s</code> search needle. Once you have that table, this algorithm does have a best-case performance of O(N/M).

Runtime of python's if substring in string

Tags:

What is the big O of the following if statement?

if "pl" in "apple":    ...

What is the overall big O of how python determines if the string "pl" is found in the string "apple"

or any other substring in string search.

Is this the most efficient way to test if a substring is in a string? Does it use the same algorithm as .find()?

805

asked Feb 05 '16 09:02

Liondancer

1 Answers

The time complexity is O(N) on average, O(NM) worst case (N being the length of the longer string, M, the shorter string you search for). As of Python 3.10, heuristics are used to lower the worst-case scenario to O(N + M) by switching algorithms.

The same algorithm is used for str.index(), str.find(), str.__contains__() (the in operator) and str.replace(); it is a simplification of the Boyer-Moore with ideas taken from the Boyer–Moore–Horspool and Sunday algorithms.

See the original stringlib discussion post, as well as the fastsearch.h source code; until Python 3.10, the base algorithm has not changed since introduction in Python 2.5 (apart from some low-level optimisations and corner-case fixes).

The post includes a Python-code outline of the algorithm:

def find(s, p):     # find first occurrence of p in s     n = len(s)     m = len(p)     skip = delta1(p)[p[m-1]]     i = 0     while i <= n-m:         if s[i+m-1] == p[m-1]: # (boyer-moore)             # potential match             if s[i:i+m-1] == p[:m-1]:                 return i             if s[i+m] not in p:                 i = i + m + 1 # (sunday)             else:                 i = i + skip # (horspool)         else:             # skip             if s[i+m] not in p:                 i = i + m + 1 # (sunday)             else:                 i = i + 1     return -1 # not found

as well as speed comparisons.

In Python 3.10, the algorithm was updated to use an enhanced version of the Crochemore and Perrin's Two-Way string searching algorithm for larger problems (with p and s longer than 100 and 2100 characters, respectively, with s at least 6 times as long as p), in response to a pathological edgecase someone reported. The commit adding this change included a write-up on how the algorithm works.

The Two-way algorithm has a worst-case time complexity of O(N + M), where O(M) is a cost paid up-front to build a shift table from the s search needle. Once you have that table, this algorithm does have a best-case performance of O(N/M).

answered Sep 21 '22 05:09

Martijn Pieters

Related questions
                            
                                Transition UIButton image from one image to another
                            
                                TensorFlow: Opening log data written by SummaryWriter
                            
                                Display content under toolbar
                            
                                Efficiency of services like Cloudinary, Imgix
                            
                                Implement Hashmap with different value types in Kotlin
                            
                                Socket.io emit on Express Route
                            
                                Intersection of two lists including duplicates?
                            
                                Sort version-dotted number strings in Javascript?
                            
                                Extract numbers from a field in PostgreSQL
                            
                                How to create an application-specific config file for TypeScript?
                            
                                Reading CSV File In Android App
                            
                                How to import react native project in android studio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With