Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

Question

I'm running Python 2.7 with the Pandas 0.11.0 library installed.

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

Lets say my data, in df1, looks like the following:

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123] and then df2 = df2.join(df1[df1['zip'] == 133]) I get the following subset of data:

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either:

1) Remove the rows from df1 as they are defined/joined with df2

OR

2) After df2 has been created, remove the rows (difference?) from df1 which df2 is composed of

Hope all of that makes sense. Please let me know if any more info is needed.

EDIT:

Ideally a third dataframe would be create that looks like this:

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1 not in df2. Thanks!

DSM · Accepted Answer

Two options come to mind. First, use isin and a mask:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby:

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea.

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

Tags:

python

pandas

DMML

1 Answers

DSM

Recent Activity

Donate For Us

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

Tags:

python

pandas

DMML

1 Answers

DSM

Related questions

Recent Activity

Donate For Us