I require an assistant with coding a comparison between two tables in python, that is currently done in winmerge.
The code is as follows
import pandas as pd
Last week's table
df1=pd.read_csv(r"C:\Users\ri0a\OneDrive - Department of Environment, Land, Water and Planning\Python practice\pvmodules+_210326.csv")
This week table with new model numbers, and expire dates
df2=pd.read_csv(r"C:\Users\ri0a\OneDrive - Department of Environment, Land, Water and Planning\Python practice\pvmodules+_210401.csv")
The table head is as below
the third column is PV_module certificate: Expiry date. I want to set a logic similar to excel logic '=IF (D2<DATEVALUE("19/04/2021"),"Expired","OK). The objective here is to delete the entire rows where the expiry date is below a specific date/ today's date.
Next,Importing dataframe_diff package
from dataframe_diff import dataframe_diff
Executing the difference
d1_column,d2_additional=dataframe_diff(df1,df2,key=['PV Module Certificate: Licensee/Certificate Holder Account','Model Number/s'])
With this package d2_additional shows if there are new rows associated with model numbers added this week compared to last week.
However, I am trying to replicate the following output
The tasks involved are
Please kindly help me with this.
Thanks in advance.
Now: as with
d2_expires = merged_df[merged_df._merge == 'left_only']
and with
d2_additional = merged_df[merged_df._merge == 'right_only']
I get the same output. Same rows are returned, which should not be the case. As seen from the screen below
This is the same as additions
And Finally, I get an error with d2_comaprison.
d3_comparison = merged_df[merged_df._merge == 'both'].\
loc[lambda x: x.PV Module Certificate: Expiry Date_last_week != x.PV Module Certificate: Expiry Date_this_week]
I am happy to share my data with you. Please email me at [email protected] for data sharing.
You have to ensure to convert the dates into datetime format after loading the data, and rename the columns to something easier to work with (for example 'cert_holder', 'model_no','approval_date','expiry_date')
I want to set a logic similar to excel logic '=IF (D2<DATEVALUE("19/04/2021"),"Expired","OK). The objective here is to delete the entire rows where the expiry date is below a specific date/ today's date.
This (removing) can be done with:
df = df[df['expiry_date'] >= pd.Timestamp('today')]
# Or
df = df[df['expiry_date'] >= pd.Timestamp('2021-04-23')]
But this only works, if your expiry dates are in datetime format.
Next merge the two dataframes:
merged_df = pd.merge(df1,df2, how='outer', on=['cert_holder','model_no'],\
suffixes=['_last_week','_this_week'], indicator=True)
If some model, in this case a row, was included in the last week's table, but is missing in current week's table, I want to assign a new field "Expired" in a new column "Status" next to it./ Or create a new dataframe, d2_expires, from only those missing rows.
d2_expires = merged_df[merged_df._merge == 'left_only']
Another dataframe, where the rows or the product models that were missing last week but added this week remains...As d2_additional.
d2_additional = merged_df[merged_df._merge == 'right_only']
A third dataframe, where any changes (for example expiry date) for same rows (same certificate + same model but different new expiry date) is captures as d3_comparison.
d3_comparison = merged_df[merged_df._merge == 'both'].\
loc[lambda x: x.expiry_date_last_week != x.expiry_date_this_week]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With