I have one frame that looks like this (df):
2000q1 2000q2 2000q3 State RegionName New York New York NaN NaN NaN California Los Angeles 207066.666667 214466.666667 220966.666667 Illinois Chicago 138400.000000 143633.333333 147866.666667
(notice that State,RegionName here is a MultiIndex)
and one frame that looks like this (ut):
State RegionName 0 Alabama Auburn 1 Alabama Florence 2 Alabama Jacksonville 3 Alabama Livingston 4 Alabama Montevallo
So to get all rows where State,RegionName are in both dataframes, I do this:
dfut = pd.merge(df, ut, how='inner', left_index=True, right_on=['State', 'RegionName'])
That works. I now want a list of rows where rows from df frame are NOT in ut frame -- like a "NOT inner join". I am pretty sure that I need to do a LEFT join which will give me the entire df, but I am not sure how to subtract ut intersecting rows out of it. Hope its clear. Thank you
include the parameter indicator=True
in your merge
and query('_merge != "both"')
dfut = pd.merge(df, ut, how='outer',
left_index=True, right_on=['State', 'RegionName'],
indicator=True)
dfut.query('_merge != "both"')
There is a much simpler way without query
You want to do the inner merge of df
and ut
to get all the
'matching' indexes of df
(i.e., indexes of df
that are in ut
)
inner_merge_index = pd.merge(df.reset_index(), ut, how='inner').set_index(['State', 'RegionName']).index
Then, you want to remove all these matched indexes from df
dfut = df.drop(list(inner_merge_index))
Note this also works with MultiIndex. To test it, you may want to expand your test df
(because it has 0 intersection with ut
) by adding a row that is actually preserved when doing the inner merge (added 'Alabama'/'Jacksonville' below). If you test it, you will see this is the only row that gets dropped.
df = pd.DataFrame([['New York', 'New York', 'NaN', 'NaN', 'NaN'],
['California', 'Los Angeles', '207066.666667', '214466.666667', '220966.666667'],
['Illinois', 'Chicago', '138400.000000', '143633.333333', '147866.666667'],
['Alabama', 'Jacksonville', '98765.000000', '54321.333333', '12345.0']],
columns=['State', 'RegionName', '2000q1', '2000q2', '2000q3']).set_index(['State', 'RegionName'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With