I wanted to pad a string with null characters ("\x00"). I know lots of ways to do this, so please do not answer with alternatives. What I want to know is: Why does Python's string.format()
function not allow padding with nulls?
Test cases:
>>> "{0:\x01<10}".format("bbb")
'bbb\x01\x01\x01\x01\x01\x01\x01'
This shows that hex-escaped characters work in general.
>>> "{0:\x00<10}".format("bbb")
'bbb '
But "\x00" gets turned into a space ("\x20").
>>> "{0:{1}<10}".format("bbb","\x00")
'bbb '
>>> "{0:{1}<10}".format("bbb",chr(0))
'bbb '
Even trying a couple other ways of doing it.
>>> "bbb" + "\x00" * 7
'bbb\x00\x00\x00\x00\x00\x00\x00'
This works, but doesn't use string.format
>>> spaces = "{0: <10}".format("bbb")
>>> nulls = "{0:\x00<10}".format("bbb")
>>> spaces == nulls
True
Python is clearly substituting spaces (chr(0x20)
) instead of nulls (chr(0x00)
).
The \x00 character is a Null-character that represents a HEX byte with all bits at 0. The first example uses the str. replace() method to replace all occurrences of the character with an empty string.
To create an f-string, prefix the string with the letter “ f ”. The string itself can be formatted in much the same way that you would with str. format(). F-strings provide a concise and convenient way to embed python expressions inside string literals for formatting.
Python uses C-style string formatting to create new, formatted strings. The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string, which contains normal text together with "argument specifiers", special symbols like "%s" and "%d".
Digging into the source code for Python 2.7, I found that the issue is in this section from ./Objects/stringlib/formatter.h
, lines 718-722 (in version 2.7.3):
/* Write into that space. First the padding. */
p = fill_padding(STRINGLIB_STR(result), len,
format->fill_char=='\0'?' ':format->fill_char,
lpad, rpad);
The trouble is that a zero/null character ('\0'
) is being used as a default when no padding character is specified. This is to enable this behavior:
>>> "{0:<10}".format("foo")
'foo '
It may be possible to set format->fill_char = ' ';
as the default in parse_internal_render_format_spec()
at ./Objects/stringlib/formatter.h:186
, but there's some bit about backwards compatibility that checks for '\0'
later on. In any case, my curiosity is satisfied. I will accept someone else's answer if it has more history or a better explanation for why than this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With