I've been able to filter Pandas dataframe rows containing a number:
import pandas as pd
df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})
c1 c2 c3 c4
0 3 3 2.0 1
1 1 3 5.0 2
2 2 3 NaN 3
df1 = df[(df.values == 1)]
c1 c2 c3 c4
0 3 3 2.0 1
1 1 3 5.0 2
But if I try to filter based excluding a number, I get a really strange result with repeated rows:
df1 = df[(df.values != 1)]
c1 c2 c3 c4
0 3 3 2.0 1
0 3 3 2.0 1
0 3 3 2.0 1
1 1 3 5.0 2
1 1 3 5.0 2
1 1 3 5.0 2
2 2 3 NaN 3
2 2 3 NaN 3
2 2 3 NaN 3
2 2 3 NaN 3
Why is that? And how can I filter only the rows that don't contain the specified number?
Thanks in advance!
First, we need to modify the original DataFrame to add the row with data [3, 10]. Perform a left-join, eliminating duplicates in df2 so that each row of df1 joins with exactly 1 row of df2 . Use the parameter indicator to return an extra column indicating which table the row was from.
Selecting rows in pandas DataFrame based on conditions Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator.
A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.
You can use the following logic to select rows from pandas DataFrame based on specified conditions: df.loc[df.column name condition] For example, if you want to get the rows where the color is green, then you’ll need to apply: df.loc[df.Color == ‘Green’] Where: Color is the column name.
To replace all the values that are greater than 60 by the value ‘High’, we can use the following row: It is also possible to work with several conditions. If you want to fill an entire row based on a Pandas Series, it is possible to pass the Series in the condition. You can also specify a callable condition for your where cond parameter.
Look at this mask
In [88]: df.values != 1
Out[88]:
array([[ True, True, True, False],
[False, True, True, True],
[ True, True, True, True]])
Slicing base on numpy convention. Every True
will be repeated, so you have repeated rows in the output. You need additional all
to check each row on all True
and return a single True/False for each row.
df[(df.values != 1).all(1)]
Out[87]:
c1 c2 c3 c4
2 2 3 NaN 3
Note: my intention is reusing your code, so I didn't change it. While the concise code should be.
df[(df != 1).all(1)]
or
df[df.ne(1).all(1)]
Use DataFrame.any:
df[~df.eq(1).any(axis=1)]
Output:
c1 c2 c3 c4
2 2 3 NaN 3
try this:
indexes = [x for x in range(len(list(df.values))) if 1 not in df.values[x]] # get indexes where 1 not appear
df.iloc[indexes]
output:
c1 c2 c3 c4
2 2 3 NaN 3
Basically, use the filter to create an index, reverse the index, and then select the rows based off that index.
import pandas as pd
df = pd.DataFrame({'c1': [3, 1, 2, 1, 3],
'c2': [3, 3, 3, 2, 3],
'c3': [2, 5, None,3, 3],
'c4': [1, 2, 3, 1, 3]})
print(df)
# Create an index based on any row containing 1
index = df.values == 1
print(index)
# This reverses the index.
# I.e.
#[False False False True] will equal False, since True is in the list
#[True False False False] will equal False, since True is in the list
#[False False False False] will equal True, since True is NOT in the list
index = [True if True not in l else False for l in index]
# Pick out only the rows where the index is true
df1 = df[index]
print(df1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With