Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python how to extend `str` and overload its constructor? [duplicate]

I have an sequence of characters, a string if you will, but I want to store metadata about the origin of the string. Additionally I want to provide a simplified constructor.

I've tried extending the str class in as many ways as Google would resolve for me. I gave up when I came to this;

class WcStr(str):
    """wc value and string flags"""

    FLAG_NIBBLES = 8 # Four Bytes

    def __init__(self, value, flags):
        super(WcStr, self).__init__()
        self.value = value
        self.flags = flags

    @classmethod
    def new_nibbles(cls, nibbles, flag_nibbles=None):
        if flag_nibbles is None:
            flag_nibbles = cls.FLAG_NIBBLES

        return cls(
            nibbles[flag_nibbles+1:],
            nibbles[:flag_nibbles]
        )

When I comment-out both parameters to @classmethod's cls() call it gives me this error:

TypeError: __init__() takes exactly 3 arguments (1 given)

Pretty typical, wrong number of args error,

With a two more arguments (eg as shown in the example code):

TypeError: str() takes at most 1 argument (2 given)

I've tried changing the __init__'s args, the super().__init__'s args, neither seem to make ant change.

With only one argument passed to cls(...) call, as the str class's error asks, I get this:

TypeError: __init__() takes exactly 3 arguments (2 given)

So I can't win here, whats gone wrong?


Ps this should be a second post but what property does str's raw string value get put into? I'd like to overload as little of the str class as I can to add this metadata into the constructor.

like image 490
ThorSummoner Avatar asked May 05 '15 05:05

ThorSummoner


1 Answers

This is exactly what the __new__ method is for.

In Python, creating an object actually has two steps. In pseudocode:

value = the_class.__new__(the_class, *args, **kwargs)
if isinstance(value, the_class):
    value.__init__(*args, **kwargs)

The two steps are called construction and initialization. Most types don't need anything fancy in construction, so they can just use the default __new__ and define an __init__ method—which is why tutorials, etc. only mention __init__.

But str objects are immutable, so the initializer can't do the usual stuff of setting up attributes and so on, because you can't set attributes on an immutable object.

So, if you want to change what the str actually holds, you have to override its __new__ method, and call the super __new__ with your modified arguments.

In this case, you don't actually want to do that… but you do want to make sure str.__new__ doesn't see your extra arguments, so you still need to override it, just to hide those arguments from it.


Meanwhile, you ask:

what property does str's raw string value get put into?

It doesn't. What would be the point? Its value is a string, so you'd have a str which had an attribute which was the same str which had an attribute which etc. ad infinitum.

Under the covers, of course, it has to be storing something. But that's under the covers. In particular, in CPython, the str class is implemented in C, and it contains, among other things, a C char * array of the actual bytes used to represent the string. You can't access that directly.

But, as a subclass of str, if you want to know your value as a string, that's just self. That's the whole point of being a subclass, after all.


So:

class WcStr(str):
    """wc value and string flags"""

    FLAG_NIBBLES = 8 # Four Bytes

    def __new__(cls, value, *args, **kwargs):
        # explicitly only pass value to the str constructor
        return super(WcStr, cls).__new__(cls, value)

    def __init__(self, value, flags):
        # ... and don't even call the str initializer 
        self.flags = flags

Of course you don't really need __init__ here; you could do your initialization along with your construction in __new__. But if you don't intend for flags to be an immutable, only-set-during-construction kind of value, it makes more conceptual sense to do it the initializer, just like any normal class.


Meanwhile:

I'd like to overload as little of the str class as I can

That may not do what you want. For example, str.__add__ and str.__getitem__ are going to return a str, not an instance of your subclass. If that's good, then you're done. If not, you will have to overload all of those methods and change them to wrap up the return value with the appropriate metadata. (You can do this programmatically, either by generating wrappers at class definition time, or by using a __getattr__ method that generates wrappers on the fly.)


One last thing to consider: the str constructor doesn't take exactly one argument.

It can take 0:

str() == ''

And, while this isn't relevant in Python 2, in Python 3 it can take 2:

str(b'abc', 'utf-8') == 'abc'

Plus, even when it takes 1 argument, it obviously doesn't have to be a string:

str(123) == '123'

So… are you sure this is the interface you want? Maybe you'd be better off creating an object that owns a string (in self.value), and just using it explicitly. Or even using it implicitly, duck-typing as a str by just delegating most or all of the str methods to self.value?

like image 70
abarnert Avatar answered Oct 07 '22 16:10

abarnert