I have two dataframes, df1
and df2
.
df1:
contig position tumor_f t_ref_count t_alt_count
1 14599 0.000000 1 0
1 14653 0.400000 3 2
1 14907 0.333333 6 3
1 14930 0.363636 7 4
df2:
contig position
1 14599
1 14653
I would like to remove the rows from df1 with matching contig, position values in df2. Something akin to: df1[df1[['contig','position']].isin(df2[['contig','position']])]
Except this doesn't work.
Version .13 is adding an isin
method to DataFrame that will accomplish this. If you're using the current master you can try:
In [46]: df1[['contig', 'position']].isin(df2.to_dict(outtype='list'))
Out[46]:
contig position
0 True True
1 True True
2 True False
3 True False
To get the elements not contained use ~
for not and index
In [45]: df1.ix[~df1[['contig', 'position']].isin(df2.to_dict(outtype='list')).
all(axis=1)]
Out[45]:
contig position tumor_f t_ref_count t_alt_count
2 1 14907 0.333333 6 3
3 1 14930 0.363636 7 4
You can do this with the Series isin
twice (works in 0.12):
In [21]: df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position'])
Out[21]:
0 True
1 True
2 False
3 False
dtype: bool
In [22]: ~(df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position']))
Out[22]:
0 False
1 False
2 True
3 True
dtype: bool
In [23]: df1[~(df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position']))]
Out[23]:
contig position tumor_f t_ref_count t_alt_count
2 1 14907 0.333333 6 3
3 1 14930 0.363636 7 4
Perhaps we can get a neat solution in 0.13 (using DataFrame's isin
like in Tom's answer).
It feel like there ought to be a neat way to do this using an inner merge...
In [31]: pd.merge(df1, df2, how="inner")
Out[31]:
contig position tumor_f t_ref_count t_alt_count
0 1 14599 0.0 1 0
1 1 14653 0.4 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With