Surprisingly, I find startswith
is slower than in
:
In [10]: s="ABCD"*10 In [11]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 307 ns per loop In [12]: %timeit "XYZ" in s 10000000 loops, best of 3: 81.7 ns per loop
As we all know, the in
operation needs to search the whole string and startswith
just needs to check the first few characters, so startswith
should be more efficient.
When s
is big enough, startswith
is faster:
In [13]: s="ABCD"*200 In [14]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 306 ns per loop In [15]: %timeit "XYZ" in s 1000000 loops, best of 3: 666 ns per loop
So it seems that calling startswith
has some overhead which makes it slower when the string is small.
And than I tried to figure out what's the overhead of the startswith
call.
First, I used an f
variable to reduce the cost of the dot operation - as mentioned in this answer - here we can see startswith
is still slower:
In [16]: f=s.startswith In [17]: %timeit f("XYZ") 1000000 loops, best of 3: 270 ns per loop
Further, I tested the cost of an empty function call:
In [18]: def func(a): pass In [19]: %timeit func("XYZ") 10000000 loops, best of 3: 106 ns per loop
Regardless of the cost of the dot operation and function call, the time of startswith
is about (270-106)=164ns, but the in
operation takes only 81.7ns. It seems there are still some overheads for startswith
, what's that?
Add the test result between startswith
and __contains__
as suggested by poke and lvc:
In [28]: %timeit s.startswith("XYZ") 1000000 loops, best of 3: 314 ns per loop In [29]: %timeit s.__contains__("XYZ") 1000000 loops, best of 3: 192 ns per loop
startsWith method is case sensitive. However there is a String. startsWithIgnoreCase method which as the name notes is case insensitive.
The startswith() method returns True if the string starts with the specified value, otherwise False.
JavaScript String startsWith()The startsWith() method returns true if a string starts with a specified string. Otherwise it returns false . The startsWith() method is case sensitive.
As already mentioned in the comments, if you use s.__contains__("XYZ")
you get a result that is more similar to s.startswith("XYZ")
because it needs to take the same route: Member lookup on the string object, followed by a function call. This is usually somewhat expensive (not enough that you should worry about of course). On the other hand, when you do "XYZ" in s
, the parser interprets the operator and can short-cut the member access to the __contains__
(or rather the implementation behind it, because __contains__
itself is just one way to access the implementation).
You can get an idea about this by looking at the bytecode:
>>> dis.dis('"XYZ" in s') 1 0 LOAD_CONST 0 ('XYZ') 3 LOAD_NAME 0 (s) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE >>> dis.dis('s.__contains__("XYZ")') 1 0 LOAD_NAME 0 (s) 3 LOAD_ATTR 1 (__contains__) 6 LOAD_CONST 0 ('XYZ') 9 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 12 RETURN_VALUE
So comparing s.__contains__("XYZ")
with s.startswith("XYZ")
will produce a more similar result, however for your example string s
, the startswith
will still be slower.
To get to that, you could check the implementation of both. Interesting to see for the contains implementation is that it is statically typed, and just assumes that the argument is a unicode object itself. So this is quite efficient.
The startswith
implementation however is a “dynamic” Python method which requires the implementation to actually parse the arguments. startswith
also supports a tuple as an argument, which makes the whole start-up of the method a bit slower: (shortened by me, with my comments):
static PyObject * unicode_startswith(PyObject *self, PyObject *args) { // argument parsing PyObject *subobj; PyObject *substring; Py_ssize_t start = 0; Py_ssize_t end = PY_SSIZE_T_MAX; int result; if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end)) return NULL; // tuple handling if (PyTuple_Check(subobj)) {} // unicode conversion substring = PyUnicode_FromObject(subobj); if (substring == NULL) {} // actual implementation result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result == -1) return NULL; return PyBool_FromLong(result); }
This is likely a big reason why startswith
is slower for strings for which a contains
is fast because of its simplicity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With