I've been running into a bit of weirdness with Unions (and Optionals, of course) in Python - namely it seems that the static type checker tests properties against all member of a union, and not a member of the union (i.e. it seems overly strict?). As an example, consider the following:
import pandas as pd
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame()
df = df.fillna(df)
return df
This creates a type warning, as pd.fillna(..., inplace: Bool = False, ...) -> Optional[pd.DataFrame]
(it is a None
return if inplace=True
). I suspect that in theory the static type checker should realize the return of the function changes depending on the arguments (as that should be known when code is written), but that's a bit beyond the point.
I have the following questions:
What is the best way to resolve this? I can think of two solutions:
i) do nothing -- which creates ugly squiggles in my code
ii) cast
the return of fillna
to a pd.DataFrame
; my understanding is this is a informative step to the static type checker so should not cause any concerns or issues?
Let us consider that I'm writing a function f
which, similarly to this, has its return types vary depending on the function call inputs, and this should be determinable before runtime. In order to avoid such errors in the future; what is the best way to go about writing this function? Would it be better to do something like a @typing.overload
?
The type II error is also known as a false negative. The type II error has an inverse relationship with the power of a statistical test. This means that the higher power of a statistical test, the lower the probability of committing a type II error.
To prevent confusion between errors in your code and errors in the tool itself it is conventional to call the error in the tool itself an internal error. In many programming languages, self-checks are done using assertions, and information about the failure is provided in the form of a stack-trace. So, how do you fix an internal error? You can't.
A Type I error occurs when you reject the null hypothesis when you indeed should not have. In the aforementioned court example, a Type I error would be convicting an innocent person — the null hypothesis of innocence is rejected when it shouldn’t have been.
Hypothesis testing . However, there are opportunities to minimize the risks of obtaining results that contain a type I error. One of the most common approaches to minimizing the probability of getting a false positive error is to minimize the significance level of a hypothesis test.
The underlying function should really be defined as an overload -- I'd suggest a patch to pandas probably
Here's what the type looks like right now:
def fillna(
self: FrameOrSeries,
value=None,
method=None,
axis=None,
inplace: bool_t = False,
limit=None,
downcast=None,
) -> Optional[FrameOrSeries]: ...
in reality, a better way to represent this is to use an @overload
-- the function returns None
when inplace = True
:
@overload
def fillna(
self: FrameOrSeries,
value=None,
method=None,
axis=None,
inplace: Literal[True] = False,
limit=None,
downcast=None,
) -> None: ...
@overload
def fillna(
self: FrameOrSeries,
value=None,
method=None,
axis=None,
inplace: Literal[False] = False,
limit=None,
downcast=None,
) -> FrameOrSeries: ...
def fillna(
self: FrameOrSeries,
value=None,
method=None,
axis=None,
inplace: bool_t = False,
limit=None,
downcast=None,
) -> Optional[FrameOrSeries]:
# actual implementation
but assuming you can't change the underlying library you have several approaches to unpacking the union. I made a video about this specifically for re.match
but I'll reiterate here since it's basically the same problem (Optional[T]
)
the assert tells the type checker something it doesn't know: that the type is narrower than it knows about. mypy will trust this assertion and the type will be assumed to be pd.DataFrame
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame()
ret = df.fillna(df)
assert ret is not None
return ret
explicitly tell the type checker that the type is what you expect, "cast"ing away the None
-ness
from typing import cast
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame()
ret = cast(pd.DataFrame, df.fillna(df))
return ret
the (imo) hacky solution is to tell the type checker to ignore the incompatibility, I would not suggest this approach but it can be helpful as a quick fix
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame()
ret = df.fillna(df)
return ret # type: ignore
The pandas.DataFrame.fillna
method is defined as returning either DataFrame
or None
.
If there is a possibility that a function will return None
, then this should be documented by using an Optional
type hint. It would be wrong to try to hide the fact a function could return None
by using a cast or a comment to ignore the warning such as:
return df # type: ignore
None
, use Optional
import numpy as np
import pandas as pd
from typing import Optional
def test_dummy() -> Optional[pd.DataFrame]:
df = pd.DataFrame([np.nan, 2, np.nan, 0])
df = df.fillna(value=0)
return df
None
, there are these optionsIf you can guarantee that a function will not return None
, but it cannot be statically inferred by a type checker, then there are three options.
This is the approach recommended by the mypy documentation.
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame([np.nan, 2, np.nan, 0])
df = df.fillna(value=0)
assert df is not None
return df
from typing import cast
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame([np.nan, 2, np.nan, 0])
df = cast(pd.DataFrame, df.fillna(value=0))
return df
from typing import cast
def test_dummy() -> pd.DataFrame:
df = pd.DataFrame([np.nan, 2, np.nan, 0])
df = df.fillna(value=0)
return df # type: ignore
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With