Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python string 'in' operator implementation algorithm and time complexity

I am thinking of how the in operator implement, for instance

>>> s1 = 'abcdef'
>>> s2 = 'bcd'
>>> s2 in s1
True

In CPython, which algorithm is used to implement the string match, and what is the time complexity? Is there any official document or wiki about this?

like image 251
mitchelllc Avatar asked Aug 09 '13 03:08

mitchelllc


People also ask

What is the time complexity of string slicing in Python?

the time complexity of slicing in python is O(k) please visit https://wiki.python.org/moin/TimeComplexity#list for more. the learning experience you deserve. the doubt. O(n) where n is the length of the slice.

Can operator be used with strings in Python?

In python, String operators represent the different types of operations that can be employed on the program's string type of variables. Python allows several string operators that can be applied on the python string are as below: Assignment operator: “=.” Concatenate operator: “+.”

What is the time complexity of in in Python?

The average time complexity of the in operator for sets is O(1) . It does not depend on the number of elements. The execution time does not change depending on the value to look for. If you want to repeat in operation for a list with many elements, it is faster to convert it to a set in advance.

How and * operators work with strings in Python?

The * operator can be used to repeat the string for a given number of times. Writing two string literals together also concatenates them like + operator. If we want to concatenate strings in different lines, we can use parentheses.


1 Answers

It's a combination of Boyer-Moore and Horspool.

You can view the C code here:

Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.

From the link above:

When designing the new algorithm, I used the following constraints:

  • should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
  • small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
  • sublinear search behaviour in good cases (O(n/m))
  • no worse than the current algorithm in worst case (O(nm))
  • should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
  • many real-life searches should be good, very few should be worst case
  • reasonably simple implementation
like image 90
arshajii Avatar answered Oct 23 '22 15:10

arshajii