I noticed the following odd behaviour when timing enumerate
with the default start
parameter specified:
In [23]: %timeit enumerate([1, 2, 3, 4])
The slowest run took 7.18 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
In [24]: %timeit enumerate([1, 2, 3, 4], start=0)
The slowest run took 12.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.22 µs per loop
So, approximately a 2x slowdown for the case where start
is specified.
The byte code issued for each case doesn't really indicate anything that would contribute to the significant difference in speed. Case in point, after examining the different calls with dis.dis
the additional commands issued are:
18 LOAD_CONST 5 ('start')
21 LOAD_CONST 6 (0)
These, along with the CALL_FUNCTION
having 1 keyword, are the only differences.
I tried tracing through the calls made in CPython
s ceval
with gdb
and both seem to use do_call
in call_function
and not some other optimization I could detect.
Now, I understand enumerate
just creates an enumerate iterator, so we're dealing with object creation here (right?). I looked in Objects/enumobject.c
trying to spot any differences if start
was specified. The only thing that (I believe) differs is when start != NULL
in which the following happens:
if (start != NULL) {
start = PyNumber_Index(start);
if (start == NULL) {
Py_DECREF(en);
return NULL;
}
assert(PyInt_Check(start) || PyLong_Check(start));
en->en_index = PyInt_AsSsize_t(start);
if (en->en_index == -1 && PyErr_Occurred()) {
PyErr_Clear();
en->en_index = PY_SSIZE_T_MAX;
en->en_longindex = start;
} else {
en->en_longindex = NULL;
Py_DECREF(start);
}
Which doesn't look like something which would introduce a 2x slowdown. (I think, not sure.)
The previous code segments have been executed on Python 3.5
, similar results are present in 2.x
too, though.
This is where I'm stuck and can't figure out where to look. This might just be a case of overhead from additional calls in the second case accumulating, but again, I'm not really sure. Does anybody know what might be the reason behind this?
One reason might be because of calling the PyNumber_Index
while you specify a start in following part :
if (start != NULL) {
start = PyNumber_Index(start);
And If you take a look at PyNumber_Index
function in abstract.c
module you'll see the following comment at the top level of function:
/* Return a Python int from the object item.
Raise TypeError if the result is not an int
or if the object cannot be interpreted as an index.
*/
So this function has to check if the object cannot be interpreted as an index and will returns the relative errors. And if you look at the source carefully you'll see the all of this checking and referencing, specially in following part which has to do a nested structure dereference in order to checking the index type:
result = item->ob_type->tp_as_number->nb_index(item);
if (result &&
!PyInt_Check(result) && !PyLong_Check(result)) {
...
Would takes much time to check and return a desire result.
But as @ user2357112 mentioned, another and most important reason is because of the python keyword argument matching.
If you time-it the function without keyword argument you'll see the difference time will decrease approximately ~2X time:
~$ python -m timeit "enumerate([1, 2, 3, 4])"
1000000 loops, best of 3: 0.251 usec per loop
~$ python -m timeit "enumerate([1, 2, 3, 4],start=0)"
1000000 loops, best of 3: 0.431 usec per loop
~$ python -m timeit "enumerate([1, 2, 3, 4],0)"
1000000 loops, best of 3: 0.275 usec per loop
The difference with positional argument is:
>>> 0.251 - 0.275
-0.024
Which seems that is because of PyNumber_Index
.
It probably is just is a combination of factors contributing to the overall slowdown.
When Python sees the CALL_FUNCTION
argument it will call call_function
as you already pointed out. After going through some if
clauses the call issued is x = do_call(func, pp_stack, na, nk);
. Notice nk
here which holds the total count of keyword arguments (so in the case of enumerate -> kw=1
).
In do_call
you'll see the following if
clause:
if (nk > 0) {
kwdict = update_keyword_args(NULL, nk, pp_stack, func);
if (kwdict == NULL)
goto call_fail;
}
If the number of keyword args is not zero (nk > 0
), call update_keyword_args
.
Now, update_keyword_args
does what you would expect, if orig_kwdict
is NULL
(which it is, look at the call to update_keyword_args
) create a new dictionary:
if (orig_kwdict == NULL)
kwdict = PyDict_New();
and then populate the dictionary with all values present in the value stack:
while (--nk >= 0) {
// copy from stack
These probably contribute significant to the overall delay.
enum
object:You're right about enum_new
, if called with enumerate([1, 2, 3, 4], start=0)
the variable start
inside enum_new
will have a value and therefore be != NULL
. As a result the if
clause will evaluate to True
and the code inside it will execute, adding time to the call.
What is performed inside the if
clause is not really heavy work, but it does contribute to the overall time required.
Additionally:
you also have the two additional byte code commands to consider, they might just be two but they add to the overall time taken due to the fact that we're timing really quick things (in the range of ns
).
Again, insignificant from an overall standpoint, but, parsing a call with kws
requires as before, a wee bit more time.
Finally:
I might be missing some stuff but overall these are some of the factors that, in conjunction, create overhead when creating a new enumerate object with start
specified.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With