Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

KeyError when using hex, octal, or binary integer as argument index with Python's str.format() method

Simple use of Python's str.format() method:

>>> '{0}'.format('zero')
'zero'

Hex, octal, and binary literals do not work:

>>> '{0x0}'.format('zero')
KeyError: '0x0'
>>> '{0o0}'.format('zero')
KeyError: '0o0'
>>> '{0b0}'.format('zero')
KeyError: '0b0'

According to the replacement field grammar, though, they should:

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | integer]
attribute_name    ::=  identifier
element_index     ::=  integer | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s"
format_spec       ::=  <described in the next section>

The integer grammar is as follows:

longinteger    ::=  integer ("l" | "L")
integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
nonzerodigit   ::=  "1"..."9"
octdigit       ::=  "0"..."7"
bindigit       ::=  "0" | "1"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"

Have I misunderstood the documentation, or does Python not behave as advertised? (I'm using Python 2.7.)

like image 383
davidchambers Avatar asked Oct 02 '22 18:10

davidchambers


1 Answers

This looks like a mistake in the grammar. And the text has nothing to clarify it; it just describes it as "a number or an identifier" and talks about how it's interpreted if a number.

Testing it out, the field is clearly not treated as an integer:

>>> '{08}'.format(*range(10)) # should be SyntaxError
'8'
>>> '{010}'.format(*range(10)) # should be '8'
'10'
>>> '{-1}'.format(*range(10)) # should be '9', but looked up as a string
KeyError: '-1'
>>> '{1 }'.format(*range(10)) # should be '1', but looked up as a string
KeyError: '1 '
>>> '{10000000000000000000}'.format(1) # should be IndexError
ValueError: Too many decimal digits in format string

Looking at the code, it doesn't borrow from the Python parser to parse format strings; it uses custom parsing, and the code to interpret an arg_spec as a number uses a get_integer function that just converts each digit and shifts and adds until the field is over or we get within a digit of PY_SSIZE_T_MAX.

PEP 3101 suggests that this is intentional:

Simple field names are either names or numbers. If numbers, they must be valid base-10 integers …

It doesn't specifically say that it must not be too close to the maximum index value, nor that negative indices can't be used. But most of the other quirks could be explained by using the "valid base-10 integer" description instead of just "integer". In fact, just describing it as digit + instead of integer would solve all of the quirks.

The element_index is parsed in exactly the same way as the arg_name. #8985 say that element_index intentionally "… uses the narrowest possible definition for integer indexes, in order to pass all other strings to mappings." Whether that's also intentional for arg_name, or whether it's an unintended consequence of using the same code, I'm not sure.

The docs are unchanged in 3.4, and the code is effectively unchanged in the current trunk.

I'd suggest searching the bug tracker and the python-dev archives to see if this has been raised before. And, if not, figure out whether you think the docs or the code should be changed, file a bug, and ideally submit a patch.

like image 91
abarnert Avatar answered Oct 07 '22 17:10

abarnert