Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

Tags:

python

pandas

I'm running Python 2.7 with the Pandas 0.11.0 library installed.

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

Lets say my data, in df1, looks like the following:

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123] and then df2 = df2.join(df1[df1['zip'] == 133]) I get the following subset of data:

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either:

1) Remove the rows from df1 as they are defined/joined with df2

OR

2) After df2 has been created, remove the rows (difference?) from df1 which df2 is composed of

Hope all of that makes sense. Please let me know if any more info is needed.

EDIT:

Ideally a third dataframe would be create that looks like this:

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1 not in df2. Thanks!

like image 810
DMML Avatar asked May 23 '13 02:05

DMML


1 Answers

Two options come to mind. First, use isin and a mask:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby:

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea.

like image 147
DSM Avatar answered Oct 23 '22 05:10

DSM