Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a pandas data frame in a unit test

Tags:

I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.

I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;

def setUp(self):     TEST_INPUT_DIR = 'data/'     test_file_name =  'testdata.csv'     try:         data = pd.read_csv(INPUT_DIR + test_file_name,             sep = ',',             header = 0)     except IOError:         print 'cannot open file'     self.fixture = data  def tearDown(self):     del self.fixture  def test1(self):         self.assertEqual(somefunction(self.fixture), somevalue)  if __name__ == '__main__':     unittest.main() 

Thanks for the help.

like image 559
tjb305 Avatar asked Jan 14 '15 19:01

tjb305


People also ask

How do I mock a DataFrame in Python?

You could mock out the entire DataFrame class using mock. patch("pandas. DataFrame", ...) . Note: it's not pd regardless of how (or even whether) you imported pandas in the current module.

What is the use of unit testing frame work in Python?

The unittest unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework.

How do you assert two data frames?

The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.


2 Answers

Pandas has some utilities for testing.

import unittest import pandas as pd from pandas.util.testing import assert_frame_equal # <-- for testing dataframes  class DFTests(unittest.TestCase):      """ class for running unittests """      def setUp(self):         """ Your setUp """         TEST_INPUT_DIR = 'data/'         test_file_name =  'testdata.csv'         try:             data = pd.read_csv(INPUT_DIR + test_file_name,                 sep = ',',                 header = 0)         except IOError:             print 'cannot open file'         self.fixture = data      def test_dataFrame_constructedAsExpected(self):         """ Test that the dataframe read in equals what you expect"""         foo = pd.DataFrame()         assert_frame_equal(self.fixture, foo) 
like image 103
Adam Slack Avatar answered Oct 02 '22 15:10

Adam Slack


If you are using latest pandas, I think the following way is a bit cleaner:

import pandas as pd  pd.testing.assert_frame_equal(my_df, expected_df) pd.testing.assert_series_equal(my_series, expected_series) pd.testing.assert_index_equal(my_index, expected_index) 

Each of these functions will raise AssertionError if they are not "equal".

For more information and options: https://pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions

like image 36
Steven Avatar answered Oct 02 '22 15:10

Steven