Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is %s faster than %d for integer substitution in python?

I am going through the examples mentioned here, and am looking at this example. I ran a sample example below on ipython, and the result is consistent, i.e., "%d" is slower than "%s":

In [1]: def m1():
   ...:     return "%d" % (2*3/5)

In [2]: def m2():
   ...:     return "%s" % (2*3/5)

In [4]: %timeit m1()
1000000 loops, best of 3: 529 ns per loop

In [5]: %timeit m2()
1000000 loops, best of 3: 192 ns per loop

In [6]: from dis import dis

In [7]: dis(m1)
  2           0 LOAD_CONST               1 ('%d')
              3 LOAD_CONST               5 (6)
              6 LOAD_CONST               4 (5)
              9 BINARY_DIVIDE       
             10 BINARY_MODULO       
             11 RETURN_VALUE        

In [9]: dis(m2)
  2           0 LOAD_CONST               1 ('%s')
              3 LOAD_CONST               5 (6)
              6 LOAD_CONST               4 (5)
              9 BINARY_DIVIDE       
             10 BINARY_MODULO       
             11 RETURN_VALUE        

Both the code blocks are similar, and even the output of disassembler is same, so why is "%s" faster than "%d"?

like image 631
Anshul Goyal Avatar asked Jan 06 '15 07:01

Anshul Goyal


1 Answers

This was discussed in hacker news, I am just formatting @nikital answer for SO:

The function PyString_Format in Objects/stringobject.c does the formatting for the % operator. For %s it calls _PyObject_Str which in turn calls str() on the object. For %d it calls formatint (located in the same file).

The str() implementation for ints is in int_to_decimal_string in (Objects/intobject.c) and it's incredibly simple:

do {
    *--p = '0' + (char)(absn % 10);
    absn /= 10;
} while (absn);

The code for formatint is way more complex, and it contains two call to the native snprintf:

PyOS_snprintf(fmt, sizeof(fmt), "%s%%%s.%dl%c",
              sign, (flags&F_ALT) ? "#" : "",
              prec, type);
// ...
PyOS_snprintf(buf, buflen, fmt, -x);

The native snprintf is heavier because it handles precision, zero-padding and stuff like that.

I believe this is why %d is slower. %s is a straight "divide-by-10-and-subtract" loop while %d are two library calls to the full-blown sprintf. However I didn't actually profile the code because I don't have a debug build, so I might be completely wrong.

like image 114
elyase Avatar answered Sep 20 '22 08:09

elyase