Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a subset of a data frame in Python?

My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.

new_df = df.drop(df1)
like image 889
XUTADO Avatar asked Sep 09 '16 09:09

XUTADO


2 Answers

The pandas cheat sheet suggests also the following technique

adf[~adf.x1.isin(bdf.x1)]

where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.

The particular question asked by the OP can also be solved by

new_df = df.drop(df1.index)
like image 176
gciriani Avatar answered Nov 03 '22 04:11

gciriani


As you seem to be unable to post a representative example I will demonstrate one approach using merge with param indicator=True:

So generate some data:

In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[116]:
          a         b         c
0 -0.134933 -0.664799 -1.611790
1  1.457741  0.652709 -1.154430
2  0.534560 -0.781352  1.978084
3  0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092  1.179237

take a subset:

In [118]:
df_subset=df.iloc[2:3]
df_subset

Out[118]:
         a         b         c
2  0.53456 -0.781352  1.978084

now perform a left merge with param indicator=True this will add _merge column which indicates whether the row is left_only, both or right_only (the latter won't appear in this example) and we filter the merged df to show only left_only:

In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new

Out[121]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

here is the original merged df:

In [122]:
df.merge(df_subset, how='left', indicator=True)

Out[122]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
2  0.534560 -0.781352  1.978084       both
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only
like image 17
EdChum Avatar answered Nov 03 '22 05:11

EdChum