I am writing a function that returns a Pandas DataFrame object. I would like to have a type hint that specifies which columns this DataFrame contains, besides just specifying in the docstring, to make it easier for the end user to read the data.
Is there a way to type hint DataFrame content like this? Ideally, this would integrate well with tools like Visual Studio Code and PyCharm when editing Python files and Jupyter Notebooks.
An example function:
def generate_data(bunch, of, inputs) -> pd.DataFrame:
"""Massages the input to a nice and easy DataFrame.
:return:
DataFrame with columns a(int), b(float), c(string), d(us dollars as float)
"""
The most powerful project for strong typing of pandas DataFrame as of now (Apr 2023) is pandera. Unfortunately, what it offers is quite limited and far from what we might have wanted.
Here is an example of how you can use pandera in your case†:
import pandas as pd
import pandera as pa
from pandera.typing import DataFrame
class MySchema(pa.DataFrameModel):
a: int
b: float
c: str = pa.Field(nullable=True) # For example, allow None values
d: float # US dollars
class OtherSchema(pa.DataFrameModel):
year: int = pa.Field(ge=1900, le=2050)
def generate_data() -> DataFrame[MySchema]:
df = pd.DataFrame({
"a": [1, 2, 3],
"b": [10.0, 20.0, 30.0],
"c": ["A", "B", "C"],
"d": [0.1, 0.2, 0.3],
})
# Runtime verification here, throws on schema mismatch
strongly_typed_df = DataFrame[MySchema](df)
return strongly_typed_df
def transform(input: DataFrame[MySchema]) -> DataFrame[OtherSchema]:
# This demonstrates that you can use strongly
# typed column names from the schema
df = input.filter(items=[MySchema.a]).rename(
columns={MySchema.a: OtherSchema.year}
)
return DataFrame[OtherSchema](df) # This will throw on range validation!
df1 = generate_data()
df2 = transform(df1)
transform(df2) # mypy prints error here - incompatible type!
You can see mypy producing static type check error on the last line:

With pandera we get –
dataclass style) DataFrame schema definitions and ability to use them as type hints.year in the example below and pandera docs for more).What we still miss –
Pandera docs - https://pandera.readthedocs.io/en/stable/dataframe_models.html
Similar question - Type hints for a pandas DataFrame with mixed dtypes
pandas-stubs is an active project providing type declarations for the pandas public API which is richer than type stubs included in pandas itself. But it doesn't provide any facilities for column level schemas.
There are quite a few outdated libraries related to this and pandas typing in general - dataenforce, data-science-types, python-type-stubs
† pandera provides two different APIs that seem to be equally powerful - object-based API and class-based API. I demonstrate the later here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With