TL;DR - ISSUE 21118
The long Story
Josh Rosenberg found out that the str.translate()
function is very slow compared to the bytes.translate
, he raised an issue, stating that:
In Python 3,
str.translate()
is usually a performance pessimization, not optimization.
str.translate()
slow?The main reason for str.translate()
to be very slow was that the lookup used to be in a Python dictionary.
The usage of maketrans
made this problem worse. The similar approach using bytes
builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict
makes the str.translate()
in Python 3.4 very slow.
The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.
The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.
The speeds now are almost the same as bytes
unpatched patched
str.translate 4.55125927699919 0.7898181750006188
str.translate from bytes trans 1.8910855210015143 0.779950579000797
A small note here is that the performance enhancement is only prominent in ASCII strings.
As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.
Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.
It can be an improvement from 71.6μs to 2.33μs as seen in this answer.
The following code demonstrates this
python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop
python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop
Tabulation of the results:
Python 3.4 Python 3.5
Ascii 91.2 2.3
Unicode 101 117
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With