I've downloaded a Python 3.6 alpha build from the Python Github repository, and one of my favourite new features is literal string formatting. It can be used like so:
>>> x = 2 >>> f"x is {x}" "x is 2"
This appears to do the same thing as using the format
function on a str
instance. However, one thing that I've noticed is that this literal string formatting is actually very slow compared to just calling format
. Here's what timeit
says about each method:
>>> x = 2 >>> timeit.timeit(lambda: f"X is {x}") 0.8658502227130764 >>> timeit.timeit(lambda: "X is {}".format(x)) 0.5500578542015617
If I use a string as timeit
's argument, my results are still showing the pattern:
>>> timeit.timeit('x = 2; f"X is {x}"') 0.5786435347381484 >>> timeit.timeit('x = 2; "X is {}".format(x)') 0.4145195760771685
As you can see, using format
takes almost half the time. I would expect the literal method to be faster because less syntax is involved. What is going on behind the scenes which causes the literal method to be so much slower?
As of Python 3.6, f-strings are a great new way to format strings. Not only are they more readable, more concise, and less prone to error than other ways of formatting, but they are also faster!
Python 3.6 f-strings have been shown to be the fastest string formatting method in microbenchmarks by Python core dev Raymond Hettinger: #Python's f-strings are amazingly fast!
Python uses C-style string formatting to create new, formatted strings.
Python string formatting It uses the % operator and classic string format specifies such as %s and %d . Since Python 3.0, the format function was introduced to provide advance formatting options. Python f-strings are available since Python 3.6. The string has the f prefix and uses {} to evaluate variables.
Note: This answer was written for the Python 3.6 alpha releases. A new opcode added to 3.6.0b1 improved f-string performance significantly.
The f"..."
syntax is effectively converted to a str.join()
operation on the literal string parts around the {...}
expressions, and the results of the expressions themselves passed through the object.__format__()
method (passing any :..
format specification in). You can see this when disassembling:
>>> import dis >>> dis.dis(compile('f"X is {x}"', '', 'exec')) 1 0 LOAD_CONST 0 ('') 3 LOAD_ATTR 0 (join) 6 LOAD_CONST 1 ('X is ') 9 LOAD_NAME 1 (x) 12 FORMAT_VALUE 0 15 BUILD_LIST 2 18 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 21 POP_TOP 22 LOAD_CONST 2 (None) 25 RETURN_VALUE >>> dis.dis(compile('"X is {}".format(x)', '', 'exec')) 1 0 LOAD_CONST 0 ('X is {}') 3 LOAD_ATTR 0 (format) 6 LOAD_NAME 1 (x) 9 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 12 POP_TOP 13 LOAD_CONST 1 (None) 16 RETURN_VALUE
Note the BUILD_LIST
and LOAD_ATTR .. (join)
op-codes in that result. The new FORMAT_VALUE
takes the top of the stack plus a format value (parsed out at compile time) to combine these in a object.__format__()
call.
So your example, f"X is {x}"
, is translated to:
''.join(["X is ", x.__format__('')])
Note that this requires Python to create a list object, and call the str.join()
method.
The str.format()
call is also a method call, and after parsing there is still a call to x.__format__('')
involved, but crucially, there is no list creation involved here. It is this difference that makes the str.format()
method faster.
Note that Python 3.6 has only been released as an alpha build; this implementation can still easily change. See PEP 494 – Python 3.6 Release Schedule for the time table, as well as Python issue #27078 (opened in response to this question) for a discussion on how to further improve the performance of formatted string literals.
Before 3.6 beta 1, the format string f'x is {x}'
was compiled to the equivalent of ''.join(['x is ', x.__format__('')])
. The resulting bytecode was inefficient for several reasons:
join
method on the empty string__format__
on even bare Unicode objects, for which the __format__('')
would always return self
, or integer objects, for which __format__('')
as the argument returned str(self)
.__format__
method isn't slotted.However, for a more complex and longer string, the literal formatted strings would still have been faster than the corresponding '...'.format(...)
call, because for the latter the string is interpreted every time the string is formatted.
This very question was the prime motivator for issue 27078 asking for a new Python bytecode opcode for string fragments into a string (the opcode gets one operand - the number of fragments on the stack; the fragments are pushed onto the stack in the order of appearance i.e. the last part is the topmost item). Serhiy Storchaka implemented this new opcode and merged it into CPython so that it has been available in Python 3.6 ever since beta 1 version (and thus in Python 3.6.0 final).
As the result the literal formatted strings will be much faster than string.format
. They are also often much faster than the old-style formatting in Python 3.6, if you're just interpolating str
or int
objects:
>>> timeit.timeit("x = 2; 'X is {}'.format(x)") 0.32464265200542286 >>> timeit.timeit("x = 2; 'X is %s' % x") 0.2260766440012958 >>> timeit.timeit("x = 2; f'X is {x}'") 0.14437875000294298
f'X is {x}'
now compiles to
>>> dis.dis("f'X is {x}'") 1 0 LOAD_CONST 0 ('X is ') 2 LOAD_NAME 0 (x) 4 FORMAT_VALUE 0 6 BUILD_STRING 2 8 RETURN_VALUE
The new BUILD_STRING
, along with an optimization in FORMAT_VALUE
code completely eliminates first 5 of the 6 sources of inefficiency. The __format__
method still isn't slotted, so it requires a dictionary lookup on the class and thus calling it is necessarily slower than calling __str__
, but a call can now be completely avoided in the common cases of formatting int
or str
instances (not subclasses!) without formatting specifiers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With