Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you Unit Test Python DataFrames

How do i unit test python dataframes?

I have functions that have an input and output as dataframes. Almost every function I have does this. Now if i want to unit test this what is the best method of doing it? It seems a bit of an effort to create a new dataframe (with values populated) for every function?

Are there any materials you can refer me to? Should you write unit tests for these functions?

like image 661
CodeGeek123 Avatar asked Jan 25 '17 13:01

CodeGeek123


People also ask

How do you assert two data frames?

The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.


3 Answers

While Pandas' test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.

These functions compare NumPy arrays, but you can get the array that underlies a Pandas DataFrame using the values property. You can define a simple DataFrame and compare what your function returns to what you expect.

One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixtures to define that DataFrame once, and use it in multiple tests.

In terms of resources, I found this article on Testing with NumPy and Pandas to be very useful. I also did a short presentation about data analysis testing at PyCon Canada 2016: Automate Your Data Analysis Testing.

like image 166
sechilds Avatar answered Oct 16 '22 10:10

sechilds


you can use pandas testing functions:

It will give more flexbile to compare your result with computed result in different ways.

For example:

df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})

expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)

For more details refer this link

like image 21
Mohamed Thasin ah Avatar answered Oct 16 '22 09:10

Mohamed Thasin ah


I don't think it's hard to create small DataFrames for unit testing?

import pandas as pd
from nose.tools import assert_dict_equal

input_df = pd.DataFrame.from_dict({
    'field_1': [some, values],
    'field_2': [other, values]
})
expected = {
    'result': [...]
}
assert_dict_equal(expected, my_func(input_df).to_dict(), "oops, there's a bug...")
like image 37
rtkaleta Avatar answered Oct 16 '22 09:10

rtkaleta