Reason for allowing Special Characters in Python Attributes

Question

I somewhat accidentally discovered that you can set 'illegal' attributes to an object using setattr. By illegal, I mean attributes with names that can't be retrieve using the __getattr__ interface with traditional . operator references. They can only be retrieved via the getattr method.

This, to me, seems rather astonishing, and I'm wondering if there's a reason for this, or if it's just something overlooked, etc. Since there exists an operator for retrieving attributes, and a standard implementation of the setattribute interface, I would expect it to only allow attribute names that can actually be retrieved normally. And, if you had some bizarre reason to want attributes that have invalid names, you would have to implement your own interface for them.

Am I alone in being surprised by this behavior?

class Foo:
    "stores attrs"

foo = Foo()
setattr(foo, "bar.baz", "this can't be reached")
dir(foo)

This returns something that is both odd, and a little misleading: [...'__weakref__', 'bar.baz']

And if I want to access foo.bar.baz in the 'standard' way, I cannot. The inability to retrieve it makes perfect sense, but the ability to set it is surprising.

foo.bar.baz
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'bar'

Is it simply assumed that, if you have to use setattr to set the variable, you are going to reference it via getattr? Because at runtime, this may not always be true, especially with Python's interactive interpreter, reflection, etc. It still seems very odd that this would be permitted by default.

EDIT: An (very rough) example of what I would expect to see as the default implementation of setattr:

import re

class Safe:
    "stores attrs"

    def __setattr__(self, attr, value):
        if not re.match(r"^\w[\w\d\-]+$", attr):
            raise AttributeError("Invalid characters in attribute name")
        else:
            super().__setattr__(attr, value)

This will not permit me to use invalid characters in my attribute names. Obviously, super() could not be used on the base Object class, but this is just an example.

mgilson · Accepted Answer

I think that your assumption that attributes must be "identifiers" is incorrect. As you've noted, python objects support arbitrary attributes (not just identifiers) because for most objects, the attributes are stored in the instance's __dict__ (which is a dict and therefore supports arbitrary string keys). However, in order to have an attribute access operator at all, the set of names that can be accessed in that way needs to be restricted to allow for the generation of a syntax that can parse it.

Is it simply assumed that, if you have to use setattr to set the variable, you are going to reference it via getattr?

No. I don't think that's assumed. I think that the assumption is that if you're referencing attributes using the . operator, then you know what those attributes are. And if you have the ability to know what those attributes are, then you probably have control over what they're called. And if you have control over what they're called, then you can name them something that the parser knows how to handle ;-).

Reason for allowing Special Characters in Python Attributes

Tags:

python

least-astonishment

Keozon

1 Answers

mgilson

Recent Activity

Donate For Us

Reason for allowing Special Characters in Python Attributes

Tags:

python

least-astonishment

Keozon

1 Answers

mgilson

Related questions

Recent Activity

Donate For Us