This seems simple, but I can't seem to figure it out. I know how to filter a pandas data frame to all rows that meet a condition, but when I want the opposite, I keep getting weird errors.
Here is the example. (Context: a simple board game where pieces are on a grid and we're trying to give it a coordinate and return all adjacent pieces, but NOT the actual piece on that actual coordinate)
import pandas as pd
import numpy as np
df = pd.DataFrame([[5,7, 'wolf'],
[5,6,'cow'],
[8, 2, 'rabbit'],
[5, 3, 'rabbit'],
[3, 2, 'cow'],
[7, 5, 'rabbit']],
columns = ['lat', 'long', 'type'])
coords = [5,7] #the coordinate I'm testing, a wolf
view = df[((coords[0] - 1) <= df['lat']) & (df['lat'] <= (coords[0] + 1)) \
& ((coords[1] - 1) <= df['long']) & (df['long'] <= (coords[1] + 1))]
view = view[not ((coords[0] == view['lat']) & (coords[1] == view['long'])) ]
print(view)
I thought the not
should just negate the boolean inside the parentheses that followed, but this doesn't seem to be how it works.
I want it to return the cow at 5,6 but NOT the wolf at 5,7 (because that's the current piece). Just to double check my logic, I did
me = view[(coords[0] == view['lat']) & (coords[1] == view['long'])]
print(me)
and this returned just the wolf, as I'd expected. So why can't I just put a not
in front of that and get everything else? Or, more importantly, what do I do instead to get everything else.
You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.
You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Use the syntax new_DataFrame = DataFrame[(DataFrame[column]==criteria1) operator (DataFrame[column2]==criteria2)] , where operator is & or | , to filter a pandas. DataFrame by multiple columns.
As numpy
(therefore pandas
) use bitwise operators, you should replace not
with ~
. This is also the reason you are using &
and not and
.
import pandas as pd
df = pd.DataFrame({'a': [1, 2]})
print(df[~(df['a'] == 1)])
>> a
1 2
And using your example:
import pandas as pd
import numpy as np
df = pd.DataFrame([[5,7, 'wolf'],
[5,6,'cow'],
[8, 2, 'rabbit'],
[5, 3, 'rabbit'],
[3, 2, 'cow'],
[7, 5, 'rabbit']],
columns = ['lat', 'long', 'type'])
coords = [5,7] #the coordinate I'm testing, a wolf
view = df[((coords[0] - 1) <= df['lat']) & (df['lat'] <= (coords[0] + 1)) \
& ((coords[1] - 1) <= df['long']) & (df['long'] <= (coords[1] + 1))]
view = view[~ ((coords[0] == view['lat']) & (coords[1] == view['long'])) ]
print(view)
>> lat long type
1 5 6 cow
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With