Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filter a pandas data frame on all rows that do NOT meet a condition [duplicate]

Tags:

python

pandas

This seems simple, but I can't seem to figure it out. I know how to filter a pandas data frame to all rows that meet a condition, but when I want the opposite, I keep getting weird errors.

Here is the example. (Context: a simple board game where pieces are on a grid and we're trying to give it a coordinate and return all adjacent pieces, but NOT the actual piece on that actual coordinate)

import pandas as pd
import numpy as np

df = pd.DataFrame([[5,7, 'wolf'],
              [5,6,'cow'],
              [8, 2, 'rabbit'],
              [5, 3, 'rabbit'],
              [3, 2, 'cow'],
              [7, 5, 'rabbit']],
              columns = ['lat', 'long', 'type'])

coords = [5,7] #the coordinate I'm testing, a wolf

view = df[((coords[0] - 1) <= df['lat']) & (df['lat'] <= (coords[0] + 1)) \
    & ((coords[1] - 1) <= df['long']) & (df['long'] <= (coords[1] + 1))]

view = view[not ((coords[0] == view['lat']) & (coords[1] == view['long'])) ] 

print(view)

I thought the not should just negate the boolean inside the parentheses that followed, but this doesn't seem to be how it works.

I want it to return the cow at 5,6 but NOT the wolf at 5,7 (because that's the current piece). Just to double check my logic, I did

me = view[(coords[0] == view['lat']) & (coords[1] == view['long'])] 
print(me)

and this returned just the wolf, as I'd expected. So why can't I just put a not in front of that and get everything else? Or, more importantly, what do I do instead to get everything else.

like image 762
seth127 Avatar asked Aug 22 '16 11:08

seth127


People also ask

How do you filter DataFrame based on rows?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do I stop duplicate rows in pandas?

You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do you check if there are duplicates in pandas DataFrame?

The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

How do you filter a DataFrame in Python based on multiple column values?

Use the syntax new_DataFrame = DataFrame[(DataFrame[column]==criteria1) operator (DataFrame[column2]==criteria2)] , where operator is & or | , to filter a pandas. DataFrame by multiple columns.


1 Answers

As numpy (therefore pandas) use bitwise operators, you should replace not with ~. This is also the reason you are using & and not and.

import pandas as pd

df = pd.DataFrame({'a': [1, 2]})

print(df[~(df['a'] == 1)])
>>    a
   1  2

And using your example:

import pandas as pd
import numpy as np

df = pd.DataFrame([[5,7, 'wolf'],
              [5,6,'cow'],
              [8, 2, 'rabbit'],
              [5, 3, 'rabbit'],
              [3, 2, 'cow'],
              [7, 5, 'rabbit']],
              columns = ['lat', 'long', 'type'])

coords = [5,7] #the coordinate I'm testing, a wolf

view = df[((coords[0] - 1) <= df['lat']) & (df['lat'] <= (coords[0] + 1)) \
    & ((coords[1] - 1) <= df['long']) & (df['long'] <= (coords[1] + 1))]

view = view[~ ((coords[0] == view['lat']) & (coords[1] == view['long'])) ] 

print(view)
>>    lat  long type
   1    5     6  cow
like image 167
DeepSpace Avatar answered Nov 04 '22 12:11

DeepSpace