I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.
I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;
def setUp(self): TEST_INPUT_DIR = 'data/' test_file_name = 'testdata.csv' try: data = pd.read_csv(INPUT_DIR + test_file_name, sep = ',', header = 0) except IOError: print 'cannot open file' self.fixture = data def tearDown(self): del self.fixture def test1(self): self.assertEqual(somefunction(self.fixture), somevalue) if __name__ == '__main__': unittest.main()
Thanks for the help.
You could mock out the entire DataFrame class using mock. patch("pandas. DataFrame", ...) . Note: it's not pd regardless of how (or even whether) you imported pandas in the current module.
The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework.
The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.
Pandas has some utilities for testing.
import unittest import pandas as pd from pandas.util.testing import assert_frame_equal # <-- for testing dataframes class DFTests(unittest.TestCase): """ class for running unittests """ def setUp(self): """ Your setUp """ TEST_INPUT_DIR = 'data/' test_file_name = 'testdata.csv' try: data = pd.read_csv(INPUT_DIR + test_file_name, sep = ',', header = 0) except IOError: print 'cannot open file' self.fixture = data def test_dataFrame_constructedAsExpected(self): """ Test that the dataframe read in equals what you expect""" foo = pd.DataFrame() assert_frame_equal(self.fixture, foo)
If you are using latest pandas, I think the following way is a bit cleaner:
import pandas as pd pd.testing.assert_frame_equal(my_df, expected_df) pd.testing.assert_series_equal(my_series, expected_series) pd.testing.assert_index_equal(my_index, expected_index)
Each of these functions will raise AssertionError
if they are not "equal".
For more information and options: https://pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With