Why is string's startswith slower than in?

Tags:

Surprisingly, I find startswith is slower than in:

In [10]: s="ABCD"*10  In [11]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 307 ns per loop  In [12]: %timeit "XYZ" in s 10000000 loops, best of 3: 81.7 ns per loop

As we all know, the in operation needs to search the whole string and startswith just needs to check the first few characters, so startswith should be more efficient.

When s is big enough, startswith is faster:

In [13]: s="ABCD"*200  In [14]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 306 ns per loop  In [15]: %timeit "XYZ" in s 1000000 loops, best of 3: 666 ns per loop

So it seems that calling startswith has some overhead which makes it slower when the string is small.

And than I tried to figure out what's the overhead of the startswith call.

First, I used an f variable to reduce the cost of the dot operation - as mentioned in this answer - here we can see startswith is still slower:

In [16]: f=s.startswith  In [17]: %timeit f("XYZ") 1000000 loops, best of 3: 270 ns per loop

Further, I tested the cost of an empty function call:

In [18]: def func(a): pass  In [19]: %timeit func("XYZ") 10000000 loops, best of 3: 106 ns per loop

Regardless of the cost of the dot operation and function call, the time of startswith is about (270-106)=164ns, but the in operation takes only 81.7ns. It seems there are still some overheads for startswith, what's that?

Add the test result between startswith and __contains__ as suggested by poke and lvc:

In [28]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 314 ns per loop  In [29]: %timeit s.__contains__("XYZ") 1000000 loops, best of 3: 192 ns per loop

835

asked Aug 10 '15 10:08

WKPlus

1 Answers

As already mentioned in the comments, if you use s.__contains__("XYZ") you get a result that is more similar to s.startswith("XYZ") because it needs to take the same route: Member lookup on the string object, followed by a function call. This is usually somewhat expensive (not enough that you should worry about of course). On the other hand, when you do "XYZ" in s, the parser interprets the operator and can short-cut the member access to the __contains__ (or rather the implementation behind it, because __contains__ itself is just one way to access the implementation).

You can get an idea about this by looking at the bytecode:

>>> dis.dis('"XYZ" in s')   1           0 LOAD_CONST               0 ('XYZ')               3 LOAD_NAME                0 (s)               6 COMPARE_OP               6 (in)               9 RETURN_VALUE >>> dis.dis('s.__contains__("XYZ")')   1           0 LOAD_NAME                0 (s)               3 LOAD_ATTR                1 (__contains__)               6 LOAD_CONST               0 ('XYZ')               9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)              12 RETURN_VALUE

So comparing s.__contains__("XYZ") with s.startswith("XYZ") will produce a more similar result, however for your example string s, the startswith will still be slower.

To get to that, you could check the implementation of both. Interesting to see for the contains implementation is that it is statically typed, and just assumes that the argument is a unicode object itself. So this is quite efficient.

The startswith implementation however is a “dynamic” Python method which requires the implementation to actually parse the arguments. startswith also supports a tuple as an argument, which makes the whole start-up of the method a bit slower: (shortened by me, with my comments):

static PyObject * unicode_startswith(PyObject *self, PyObject *args) {     // argument parsing     PyObject *subobj;     PyObject *substring;     Py_ssize_t start = 0;     Py_ssize_t end = PY_SSIZE_T_MAX;     int result;     if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))         return NULL;      // tuple handling     if (PyTuple_Check(subobj)) {}      // unicode conversion     substring = PyUnicode_FromObject(subobj);     if (substring == NULL) {}      // actual implementation     result = tailmatch(self, substring, start, end, -1);     Py_DECREF(substring);     if (result == -1)         return NULL;     return PyBool_FromLong(result); }

This is likely a big reason why startswith is slower for strings for which a contains is fast because of its simplicity.

120

answered Oct 08 '22 11:10

poke

Related questions
                            
                                PIL /JPEG Library: "decoder jpeg not available"
                            
                                Sliding window of M-by-N shape numpy.ndarray
                            
                                python list comprehension with multiple 'if's
                            
                                Get the second largest number in a list in linear time
                            
                                How to write unicode strings into a file? [duplicate]
                            
                                Display fullscreen mode on Tkinter
                            
                                Inserting an item in a Tuple [duplicate]
                            
                                Python (pip) - RequestsDependencyWarning: urllib3 (1.9.1) or chardet (2.3.0) doesn't match a supported version
                            
                                xls to csv converter
                            
                                How to limit a number to be within a specified range? (Python)
                            
                                Clear all widgets in a layout in pyqt
                            
                                Generate unique id in django from a model field
                            
                                ImportError: No module named mysql.connector using Python2
                            
                                Enforcing python version in setup.py
                            
                                Efficient calculation of Fibonacci series
                            
                                Nose unable to find tests in ubuntu
                            
                                Running a test suite with over a million test cases
                            
                                Error packaging Kivy with numpy library for Android using buildozer
                            
                                gensim Doc2Vec vs tensorflow Doc2Vec
                            
                                Best practices for turning jupyter notebooks into python scripts

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is string's startswith slower than in?

Tags:

python

python-internals

startswith

cpython

python-2.7

WKPlus

People also ask

1 Answers

poke

Recent Activity

Donate For Us