Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I avoid type errors when internal function returns 'Union' that could be 'None'?

I've been running into a bit of weirdness with Unions (and Optionals, of course) in Python - namely it seems that the static type checker tests properties against all member of a union, and not a member of the union (i.e. it seems overly strict?). As an example, consider the following:

import pandas as pd

def test_dummy() -> pd.DataFrame:
   df = pd.DataFrame()
   df = df.fillna(df)
   return df

This creates a type warning, as pd.fillna(..., inplace: Bool = False, ...) -> Optional[pd.DataFrame] (it is a None return if inplace=True). I suspect that in theory the static type checker should realize the return of the function changes depending on the arguments (as that should be known when code is written), but that's a bit beyond the point.

I have the following questions:

  1. What is the best way to resolve this? I can think of two solutions:

    i) do nothing -- which creates ugly squiggles in my code

    ii) cast the return of fillna to a pd.DataFrame; my understanding is this is a informative step to the static type checker so should not cause any concerns or issues?

  2. Let us consider that I'm writing a function f which, similarly to this, has its return types vary depending on the function call inputs, and this should be determinable before runtime. In order to avoid such errors in the future; what is the best way to go about writing this function? Would it be better to do something like a @typing.overload?

like image 915
deetsb Avatar asked Nov 18 '20 16:11

deetsb


People also ask

What is a type II error?

The type II error is also known as a false negative. The type II error has an inverse relationship with the power of a statistical test. This means that the higher power of a statistical test, the lower the probability of committing a type II error.

How do I Fix an internal error in my code?

To prevent confusion between errors in your code and errors in the tool itself it is conventional to call the error in the tool itself an internal error. In many programming languages, self-checks are done using assertions, and information about the failure is provided in the form of a stack-trace. So, how do you fix an internal error? You can't.

What is an example of a type 1 error?

A Type I error occurs when you reject the null hypothesis when you indeed should not have. In the aforementioned court example, a Type I error would be convicting an innocent person — the null hypothesis of innocence is rejected when it shouldn’t have been.

How do you minimize the risk of Type 1 error?

Hypothesis testing . However, there are opportunities to minimize the risks of obtaining results that contain a type I error. One of the most common approaches to minimizing the probability of getting a false positive error is to minimize the significance level of a hypothesis test.


2 Answers

The underlying function should really be defined as an overload -- I'd suggest a patch to pandas probably

Here's what the type looks like right now:

    def fillna(
        self: FrameOrSeries,
        value=None,
        method=None,
        axis=None,
        inplace: bool_t = False,
        limit=None,
        downcast=None,
    ) -> Optional[FrameOrSeries]: ...

in reality, a better way to represent this is to use an @overload -- the function returns None when inplace = True:

    @overload
    def fillna(
        self: FrameOrSeries,
        value=None,
        method=None,
        axis=None,
        inplace: Literal[True] = False,
        limit=None,
        downcast=None,
    ) -> None: ...


    @overload
    def fillna(
        self: FrameOrSeries,
        value=None,
        method=None,
        axis=None,
        inplace: Literal[False] = False,
        limit=None,
        downcast=None,
    ) -> FrameOrSeries: ...


    def fillna(
        self: FrameOrSeries,
        value=None,
        method=None,
        axis=None,
        inplace: bool_t = False,
        limit=None,
        downcast=None,
    ) -> Optional[FrameOrSeries]:
        # actual implementation

but assuming you can't change the underlying library you have several approaches to unpacking the union. I made a video about this specifically for re.match but I'll reiterate here since it's basically the same problem (Optional[T])

option 1: an assert indicating the expected return type

the assert tells the type checker something it doesn't know: that the type is narrower than it knows about. mypy will trust this assertion and the type will be assumed to be pd.DataFrame

def test_dummy() -> pd.DataFrame:
   df = pd.DataFrame()
   ret = df.fillna(df)
   assert ret is not None
   return ret

option 2: cast

explicitly tell the type checker that the type is what you expect, "cast"ing away the None-ness

from typing import cast

def test_dummy() -> pd.DataFrame:
   df = pd.DataFrame()
   ret = cast(pd.DataFrame, df.fillna(df))
   return ret

type: ignore

the (imo) hacky solution is to tell the type checker to ignore the incompatibility, I would not suggest this approach but it can be helpful as a quick fix

def test_dummy() -> pd.DataFrame:
   df = pd.DataFrame()
   ret = df.fillna(df)
   return ret  # type: ignore
like image 81
Anthony Sottile Avatar answered Sep 27 '22 20:09

Anthony Sottile


The pandas.DataFrame.fillna method is defined as returning either DataFrame or None.

If there is a possibility that a function will return None, then this should be documented by using an Optional type hint. It would be wrong to try to hide the fact a function could return None by using a cast or a comment to ignore the warning such as:

return df  # type: ignore

If function could return None, use Optional

import numpy as np
import pandas as pd
from typing import Optional


def test_dummy() -> Optional[pd.DataFrame]:
    df = pd.DataFrame([np.nan, 2, np.nan, 0])
    df = df.fillna(value=0)
    return df

Function guaranteed not to return None, there are these options

If you can guarantee that a function will not return None, but it cannot be statically inferred by a type checker, then there are three options.

Option 1: Use an assertion to indicate that DataFrame is not None

This is the approach recommended by the mypy documentation.

def test_dummy() -> pd.DataFrame:
    df = pd.DataFrame([np.nan, 2, np.nan, 0])
    df = df.fillna(value=0)
    assert df is not None 
    return df

Option 2: Use a cast

from typing import cast

def test_dummy() -> pd.DataFrame:
    df = pd.DataFrame([np.nan, 2, np.nan, 0])
    df = cast(pd.DataFrame, df.fillna(value=0))
    return df

Option 3: Tell mypy to ignore the warning (not recommended)

from typing import cast

def test_dummy() -> pd.DataFrame:
    df = pd.DataFrame([np.nan, 2, np.nan, 0])
    df = df.fillna(value=0)
    return df  # type: ignore
like image 43
Christopher Peisert Avatar answered Sep 27 '22 21:09

Christopher Peisert