Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't Python's string.format pad with "\x00"?

I wanted to pad a string with null characters ("\x00"). I know lots of ways to do this, so please do not answer with alternatives. What I want to know is: Why does Python's string.format() function not allow padding with nulls?

Test cases:

>>> "{0:\x01<10}".format("bbb")
'bbb\x01\x01\x01\x01\x01\x01\x01'

This shows that hex-escaped characters work in general.

>>> "{0:\x00<10}".format("bbb")
'bbb       '

But "\x00" gets turned into a space ("\x20").

>>> "{0:{1}<10}".format("bbb","\x00")
'bbb       '
>>> "{0:{1}<10}".format("bbb",chr(0))
'bbb       '

Even trying a couple other ways of doing it.

>>> "bbb" + "\x00" * 7
'bbb\x00\x00\x00\x00\x00\x00\x00'

This works, but doesn't use string.format

>>> spaces = "{0: <10}".format("bbb")
>>> nulls  = "{0:\x00<10}".format("bbb")
>>> spaces == nulls
True

Python is clearly substituting spaces (chr(0x20)) instead of nulls (chr(0x00)).

like image 533
bonsaiviking Avatar asked May 24 '13 18:05

bonsaiviking


People also ask

What is '\ x00 in Python?

The \x00 character is a Null-character that represents a HEX byte with all bits at 0. The first example uses the str. replace() method to replace all occurrences of the character with an empty string.

How string formatting operator can be used in Python?

To create an f-string, prefix the string with the letter “ f ”. The string itself can be formatted in much the same way that you would with str. format(). F-strings provide a concise and convenient way to embed python expressions inside string literals for formatting.

Can you format a string in Python?

Python uses C-style string formatting to create new, formatted strings. The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string, which contains normal text together with "argument specifiers", special symbols like "%s" and "%d".


1 Answers

Digging into the source code for Python 2.7, I found that the issue is in this section from ./Objects/stringlib/formatter.h, lines 718-722 (in version 2.7.3):

/* Write into that space. First the padding. */
p = fill_padding(STRINGLIB_STR(result), len,
                 format->fill_char=='\0'?' ':format->fill_char,
                 lpad, rpad);

The trouble is that a zero/null character ('\0') is being used as a default when no padding character is specified. This is to enable this behavior:

>>> "{0:<10}".format("foo")
'foo       '

It may be possible to set format->fill_char = ' '; as the default in parse_internal_render_format_spec() at ./Objects/stringlib/formatter.h:186, but there's some bit about backwards compatibility that checks for '\0' later on. In any case, my curiosity is satisfied. I will accept someone else's answer if it has more history or a better explanation for why than this.

like image 174
bonsaiviking Avatar answered Oct 14 '22 06:10

bonsaiviking