Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe: allclose [duplicate]

I've got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing...

How can I compare two dataframes to check if they're the same or not?

csvdata = pandas.read_csv('csvfile.csv')
csvdata_old = csvdata

# ... do stuff with csvdata dataframe

if csvdata_old != csvdata:
    csvdata.to_csv('csvfile.csv', index=False)

Any ideas?

like image 247
Ryflex Avatar asked Nov 11 '13 22:11

Ryflex


People also ask

How do I know if 2 DFS are equal?

DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements.

How do I check if two rows have the same value in pandas?

The equals() function is used to test whether two Pandas objects contain the same elements.


2 Answers

You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):

csvdata_old = csvdata.copy()

To check whether they are equal, you can use assert_frame_equal as in this answer:

from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)

You can wrap this in a function with something like:

try:
    assert_frame_equal(csvdata, csvdata_old)
    return True
except:  # appeantly AssertionError doesn't catch all
    return False

There was discussion of a better way...

like image 109
Andy Hayden Avatar answered Oct 13 '22 12:10

Andy Hayden


Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.

def get_different_rows(source_df, new_df):
    """Returns just the rows from the new dataframe that differ from the source dataframe"""
    merged_df = source_df.merge(new_df, indicator=True, how='outer')
    changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
    return changed_rows_df.drop('_merge', axis=1)
like image 34
Tom Chapin Avatar answered Oct 13 '22 11:10

Tom Chapin