Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between type alias and NewType

Tags:

python

typing

What is the difference between this:

INPUT_FORMAT_TYPE  = NewType('INPUT_FORMAT_TYPE', Tuple[str, str, str])

and this

INPUT_FORMAT_TYPE  = Tuple[str, str, str]

Functionally, both work but IDEs like PyCharm flag code like this:

return cast(INPUT_FORMAT_TYPE, ("*", "*", "All"))
like image 772
hasii Avatar asked Dec 30 '22 22:12

hasii


1 Answers

InputFormat (renamed it to keep type notation consistent) can be a subtype or alias of Tuple[str, str, str]. Having it be a subtype (your first example) instead of an alias (your second example) is useful for a situation where you want to statically verify (through something like mypy) that all InputFormats were made in a certain way. For example:

def new_input_format(a: str) -> InputFormat:
    return InputFormat((a, a * 2, a * 4))

def print_input_format(input_format: InputFormat):
    print(input_format)

print_input_format(new_input_format("a")) # Statement 1
print_input_format(("a", "aa", "aaa"))    # Statement 2

If InputFormat is declared as an alias (through InputFormat = Tuple[str, str, str]), both statements will statically verify. If InputFormat is declared as a subtype (through InputFormat = NewType('InputFormat', Tuple[str, str, str])), only the first statement will statically verify.

Now this isn't foolproof. A third statement such as:

print_input_format(InputFormat(("a", "aa", "aaa")))

will statically verify, yet it bypasses our careful InputFormat creator called new_input_format. However, by making InputFormat a subtype here we were forced to explicitly acknowledge that we're creating an input format through having to wrap the tuple in an InputFormat, which makes it easier to maintain this type of code and spot potential bugs in input format constructions.

Another example where NewType is beneficial over a type alias:

Let's say you had a database which we expose two functions for:

def read_user_id_from_session_id(session_id: str) -> Optional[str]:
    ...

def read_user(user_id: str) -> User:
    ...

intended to be called like this (exhibit A):

user_id = read_user_id_by_session_id(session_id)

if user_id:
    user = read_user(user_id)

    # Do something with `user`.
else:
    print("User not found!")

Forget about the fact that we can use a join here to make this only one query instead of two. Anyways, we want to only allow a return value of read_user_id_from_session_id to be used in read_user (since in our system, a user ID can only come from a session). We don't want to allow any value, reason being that it's probably a mistake. Imagine we did this (exhibit B):

user = read_user(session_id)

To a quick reader, it may appear correct. They'd probably think a select * from users where session_id = $1 is happening. However, this is actually treating a session_id as a user_id, and with our current type hints it passes despite causing unintended behavior at runtime. Instead, we can change the type hints to this:

UserID = NewType("UserID", str)

def read_user_id_from_session_id(session_id: str) -> Optional[UserID]:
    ...

def read_user(user_id: UserID) -> User:
    ...

Exhibit A expressed above would still work, because the flow of data is correct. But we'd have to turn Exhibit B into

read_user(UserID(session_id))

which quickly points out the problem of converting a session_id to a user_id without going through the required function.

In other programming languages with better type systems, this can be taken a step further. You can actually prohibit explicit construction like UserID(...) in all but one place, causing everyone to have to go through that one place in order to obtain a piece of data of that type. In Python, you can bypass the intended flow of data by explicitly doing YourNewType(...) anywhere. While NewType is beneficial over simply type aliases, it leaves this feature to be desired.

like image 119
Mario Ishac Avatar answered Jan 06 '23 03:01

Mario Ishac