What is the difference between this:
INPUT_FORMAT_TYPE = NewType('INPUT_FORMAT_TYPE', Tuple[str, str, str])
and this
INPUT_FORMAT_TYPE = Tuple[str, str, str]
Functionally, both work but IDEs like PyCharm flag code like this:
return cast(INPUT_FORMAT_TYPE, ("*", "*", "All"))
InputFormat
(renamed it to keep type notation consistent) can be a subtype or alias of Tuple[str, str, str]
. Having it be a subtype (your first example) instead of an alias (your second example) is useful for a situation where you want to statically verify (through something like mypy
) that all InputFormat
s were made in a certain way. For example:
def new_input_format(a: str) -> InputFormat:
return InputFormat((a, a * 2, a * 4))
def print_input_format(input_format: InputFormat):
print(input_format)
print_input_format(new_input_format("a")) # Statement 1
print_input_format(("a", "aa", "aaa")) # Statement 2
If InputFormat
is declared as an alias (through InputFormat = Tuple[str, str, str]
), both statements will statically verify. If InputFormat
is declared as a subtype (through InputFormat = NewType('InputFormat', Tuple[str, str, str])
), only the first statement will statically verify.
Now this isn't foolproof. A third statement such as:
print_input_format(InputFormat(("a", "aa", "aaa")))
will statically verify, yet it bypasses our careful InputFormat
creator called new_input_format
. However, by making InputFormat
a subtype here we were forced to explicitly acknowledge that we're creating an input format through having to wrap the tuple
in an InputFormat
, which makes it easier to maintain this type of code and spot potential bugs in input format constructions.
Another example where NewType
is beneficial over a type alias:
Let's say you had a database which we expose two functions for:
def read_user_id_from_session_id(session_id: str) -> Optional[str]:
...
def read_user(user_id: str) -> User:
...
intended to be called like this (exhibit A):
user_id = read_user_id_by_session_id(session_id)
if user_id:
user = read_user(user_id)
# Do something with `user`.
else:
print("User not found!")
Forget about the fact that we can use a join here to make this only one query instead of two. Anyways, we want to only allow a return value of read_user_id_from_session_id
to be used in read_user
(since in our system, a user ID can only come from a session). We don't want to allow any value, reason being that it's probably a mistake. Imagine we did this (exhibit B):
user = read_user(session_id)
To a quick reader, it may appear correct. They'd probably think a select * from users where session_id = $1
is happening. However, this is actually treating a session_id
as a user_id
, and with our current type hints it passes despite causing unintended behavior at runtime. Instead, we can change the type hints to this:
UserID = NewType("UserID", str)
def read_user_id_from_session_id(session_id: str) -> Optional[UserID]:
...
def read_user(user_id: UserID) -> User:
...
Exhibit A expressed above would still work, because the flow of data is correct. But we'd have to turn Exhibit B into
read_user(UserID(session_id))
which quickly points out the problem of converting a session_id
to a user_id
without going through the required function.
In other programming languages with better type systems, this can be taken a step further. You can actually prohibit explicit construction like UserID(...)
in all but one place, causing everyone to have to go through that one place in order to obtain a piece of data of that type. In Python, you can bypass the intended flow of data by explicitly doing YourNewType(...)
anywhere. While NewType
is beneficial over simply type aliases, it leaves this feature to be desired.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With