I have an sequence of characters, a string if you will, but I want to store metadata about the origin of the string. Additionally I want to provide a simplified constructor.
I've tried extending the str
class in as many ways as Google would resolve for me. I gave up when I came to this;
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __init__(self, value, flags):
super(WcStr, self).__init__()
self.value = value
self.flags = flags
@classmethod
def new_nibbles(cls, nibbles, flag_nibbles=None):
if flag_nibbles is None:
flag_nibbles = cls.FLAG_NIBBLES
return cls(
nibbles[flag_nibbles+1:],
nibbles[:flag_nibbles]
)
When I comment-out both parameters to @classmethod
's cls() call it gives me this error:
TypeError: __init__() takes exactly 3 arguments (1 given)
Pretty typical, wrong number of args error,
With a two more arguments (eg as shown in the example code):
TypeError: str() takes at most 1 argument (2 given)
I've tried changing the __init__
's args, the super().__init__
's args, neither seem to make ant change.
With only one argument passed to cls(...)
call, as the str class's error asks, I get this:
TypeError: __init__() takes exactly 3 arguments (2 given)
So I can't win here, whats gone wrong?
Ps this should be a second post but what property does str's raw string value get put into? I'd like to overload as little of the str class as I can to add this metadata into the constructor.
This is exactly what the __new__
method is for.
In Python, creating an object actually has two steps. In pseudocode:
value = the_class.__new__(the_class, *args, **kwargs)
if isinstance(value, the_class):
value.__init__(*args, **kwargs)
The two steps are called construction and initialization. Most types don't need anything fancy in construction, so they can just use the default __new__
and define an __init__
method—which is why tutorials, etc. only mention __init__
.
But str
objects are immutable, so the initializer can't do the usual stuff of setting up attributes and so on, because you can't set attributes on an immutable object.
So, if you want to change what the str
actually holds, you have to override its __new__
method, and call the super __new__
with your modified arguments.
In this case, you don't actually want to do that… but you do want to make sure str.__new__
doesn't see your extra arguments, so you still need to override it, just to hide those arguments from it.
Meanwhile, you ask:
what property does str's raw string value get put into?
It doesn't. What would be the point? Its value is a string, so you'd have a str
which had an attribute which was the same str
which had an attribute which etc. ad infinitum.
Under the covers, of course, it has to be storing something. But that's under the covers. In particular, in CPython, the str
class is implemented in C, and it contains, among other things, a C char *
array of the actual bytes used to represent the string. You can't access that directly.
But, as a subclass of str
, if you want to know your value as a string, that's just self
. That's the whole point of being a subclass, after all.
So:
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __new__(cls, value, *args, **kwargs):
# explicitly only pass value to the str constructor
return super(WcStr, cls).__new__(cls, value)
def __init__(self, value, flags):
# ... and don't even call the str initializer
self.flags = flags
Of course you don't really need __init__
here; you could do your initialization along with your construction in __new__
. But if you don't intend for flags
to be an immutable, only-set-during-construction kind of value, it makes more conceptual sense to do it the initializer, just like any normal class.
Meanwhile:
I'd like to overload as little of the str class as I can
That may not do what you want. For example, str.__add__
and str.__getitem__
are going to return a str
, not an instance of your subclass. If that's good, then you're done. If not, you will have to overload all of those methods and change them to wrap up the return value with the appropriate metadata. (You can do this programmatically, either by generating wrappers at class definition time, or by using a __getattr__
method that generates wrappers on the fly.)
One last thing to consider: the str
constructor doesn't take exactly one argument.
It can take 0:
str() == ''
And, while this isn't relevant in Python 2, in Python 3 it can take 2:
str(b'abc', 'utf-8') == 'abc'
Plus, even when it takes 1 argument, it obviously doesn't have to be a string:
str(123) == '123'
So… are you sure this is the interface you want? Maybe you'd be better off creating an object that owns a string (in self.value
), and just using it explicitly. Or even using it implicitly, duck-typing as a str
by just delegating most or all of the str
methods to self.value
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With