More efficient way to make unicode escape codes

Question

I am using python to automatically generate qsf files for Qualtrics online surveys. The qsf file requires unicode characters to be escaped using the \u+hex convention: 'слово' = '\u0441\u043b\u043e\u0432\u043e'. Currently, I am achieving this with the following expression:

'слово'.encode('ascii','backslashreplace').decode('ascii')

The output is exactly what I need, but since this is a two-step process, I wondered if there is a more efficient way to get the same result.

Neapolitan · Accepted Answer

If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:

s = 'слово'
with open('data.txt','wb') as f:
    f.write(s.encode('unicode_escape'))
    f.write(b'
')  # add a line feed

This seems to do what you want:

$ cat data.txt
\u0441\u043b\u043e\u0432\u043e

and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.

Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.

%timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):

In [18]: f = open('data.txt', 'wb')

In [19]: %timeit f.write(s.encode('unicode_escape'))
The slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.55 µs per loop

In [20]: %timeit f.write(s.encode('ascii','backslashreplace'))
The slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.37 µs per loop

In [21]: f.close()

Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.

jfs · Answer

I doubt that it is a performance bottleneck in your application but s.encode('unicode_escape') can be faster than s.encode('ascii', 'backslashreplace').

To avoid calling .encode() manually, you could pass the encoding to open():

with open(filename, 'w', encoding='unicode_escape') as file:
    print(s, file=file)

Note: it translates non-printable ascii characters too e.g., a newline is written as , tab as , etc.

More efficient way to make unicode escape codes

Tags:

python

python-3.x

character-encoding

unicode

reynoldsnlp

2 Answers

Neapolitan

jfs

Recent Activity

Donate For Us

More efficient way to make unicode escape codes

Tags:

python

python-3.x

character-encoding

unicode

reynoldsnlp

2 Answers

Neapolitan

jfs

Related questions

Recent Activity

Donate For Us