I've got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing...
How can I compare two dataframes to check if they're the same or not?
csvdata = pandas.read_csv('csvfile.csv') csvdata_old = csvdata # ... do stuff with csvdata dataframe if csvdata_old != csvdata: csvdata.to_csv('csvfile.csv', index=False)
Any ideas?
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
Step 1: Define two Pandas series, s1 and s2. Step 2: Compare the series using compare() function in the Pandas series. Step 3: Print their difference.
If your two dataframes have the same ids in them, then finding out what changed is actually pretty easy. Just doing frame1 != frame2 will give you a boolean DataFrame where each True is data that has changed. From that, you could easily get the index of each changed row by doing changedids = frame1.
You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try: assert_frame_equal(csvdata, csvdata_old) return True except: # appeantly AssertionError doesn't catch all return False
There was discussion of a better way...
Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.
def get_different_rows(source_df, new_df): """Returns just the rows from the new dataframe that differ from the source dataframe""" merged_df = source_df.merge(new_df, indicator=True, how='outer') changed_rows_df = merged_df[merged_df['_merge'] == 'right_only'] return changed_rows_df.drop('_merge', axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With