If a function or method returns a Pandas DataFrame, how do you document the column names and column types? Is there a way to do this within Python's builtin type annotation or do you just use docstrings?
If you do just use docstrings, how do you format them to be as succinct as possible?
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
DataFrame - dtypes property The dtypes property is used to find the dtypes in the DataFrame. This returns a Series with the data type of each column. The result's index is the original DataFrame's columns. Columns with mixed types are stored with the object dtype.
There's a better way. It's called PyArrow — an amazing Python binding for the Apache Arrow project. It introduces faster data read/write times and doesn't otherwise interfere with your data analysis pipeline. It's the best of both worlds, as you can still use Pandas for further calculations.
You can change the column type in pandas dataframe using the df. astype() method.
I use the numpy docstring convention as a basis. If a function's input parameter or return parameter is a pandas dataframe with predetermined columns, then I add a reStructuredText-style table with column descriptions to the parameter description. As an example:
def random_dataframe(no_rows):
"""Return dataframe with random data.
Parameters
----------
no_rows : int
Desired number of data rows.
Returns
-------
pd.DataFrame
Dataframe with with randomly selected values. Data columns are as follows:
========== ==============================================================
rand_int randomly chosen whole numbers (as `int`)
rand_float randomly chosen numbers with decimal parts (as `float`)
rand_color randomly chosen colors (as `str`)
rand_bird randomly chosen birds (as `str`)
========== ==============================================================
"""
df = pd.DataFrame({
"rand_int": np.random.randint(0, 100, no_rows),
"rand_float": np.random.rand(no_rows),
"rand_color": np.random.choice(['green', 'red', 'blue', 'yellow'], no_rows),
"rand_bird": np.random.choice(['kiwi', 'duck', 'owl', 'parrot'], no_rows),
})
return df
The aforementioned docstring format is compatible with the sphinx autodoc documentation generator. This is how the docstring looks like in HTML documentation that was automatically generated by sphinx (using the nature theme):
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With