How do i unit test python dataframes?
I have functions that have an input and output as dataframes. Almost every function I have does this. Now if i want to unit test this what is the best method of doing it? It seems a bit of an effort to create a new dataframe (with values populated) for every function?
Are there any materials you can refer me to? Should you write unit tests for these functions?
The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.
While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.
These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values
property. You can define a simple DataFrame and compare what your function returns to what you expect.
One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.
In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.
you can use pandas testing functions:
It will give more flexbile to compare your result with computed result in different ways.
For example:
df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})
expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)
For more details refer this link
I don't think it's hard to create small DataFrames for unit testing?
import pandas as pd
from nose.tools import assert_dict_equal
input_df = pd.DataFrame.from_dict({
'field_1': [some, values],
'field_2': [other, values]
})
expected = {
'result': [...]
}
assert_dict_equal(expected, my_func(input_df).to_dict(), "oops, there's a bug...")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With