I have two dataframes:
df1 = row1;row2;row3 df2 = row4;row5;row6;row2
I want my output dataframe to only contain the rows unique in df1, i.e.:
df_out = row1;row3
How do I get this most efficiently?
This code does what I want, but using 2 for-loops:
a = pd.DataFrame({0:[1,2,3],1:[10,20,30]}) b = pd.DataFrame({0:[0,1,2,3],1:[0,1,20,3]}) match_ident = [] for i in range(0,len(a)): found=False for j in range(0,len(b)): if a[0][i]==b[0][j]: if a[1][i]==b[1][j]: found=True match_ident.append(not(found)) a = a[match_ident]
To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.
By using pandas. DataFrame. drop_duplicates() method you can drop/remove/delete duplicate rows from DataFrame. Using this method you can drop duplicate rows on selected multiple columns or all columns.
Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).
You an use merge
with parameter indicator
and outer join, query
for filtering and then remove helper column with drop
:
DataFrames are joined on all columns, so on
parameter can be omit.
print (pd.merge(a,b, indicator=True, how='outer') .query('_merge=="left_only"') .drop('_merge', axis=1)) 0 1 0 1 10 2 3 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With