Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I select rows from a Pandas dataframe were any value is not equal to a number?

Tags:

python

pandas

I've been able to filter Pandas dataframe rows containing a number:

import pandas as pd    

df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3

df1 = df[(df.values == 1)]

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2

But if I try to filter based excluding a number, I get a really strange result with repeated rows:

df1 = df[(df.values != 1)]

   c1  c2   c3  c4
0   3   3  2.0   1
0   3   3  2.0   1
0   3   3  2.0   1
1   1   3  5.0   2
1   1   3  5.0   2
1   1   3  5.0   2
2   2   3  NaN   3
2   2   3  NaN   3
2   2   3  NaN   3
2   2   3  NaN   3

Why is that? And how can I filter only the rows that don't contain the specified number?

Thanks in advance!

like image 933
Dribbler Avatar asked Oct 12 '19 18:10

Dribbler


People also ask

How can I get the rows of dataframe1 which are not in dataframe2?

First, we need to modify the original DataFrame to add the row with data [3, 10]. Perform a left-join, eliminating duplicates in df2 so that each row of df1 joins with exactly 1 row of df2 . Use the parameter indicator to return an extra column indicating which table the row was from.

How to select rows in pandas Dataframe based on conditions?

Selecting rows in pandas DataFrame based on conditions Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator.

What is a pandas Dataframe?

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

How do I select a specific row in a Dataframe in Python?

You can use the following logic to select rows from pandas DataFrame based on specified conditions: df.loc[df.column name condition] For example, if you want to get the rows where the color is green, then you’ll need to apply: df.loc[df.Color == ‘Green’] Where: Color is the column name.

How to replace all values greater than 60 by high in pandas?

To replace all the values that are greater than 60 by the value ‘High’, we can use the following row: It is also possible to work with several conditions. If you want to fill an entire row based on a Pandas Series, it is possible to pass the Series in the condition. You can also specify a callable condition for your where cond parameter.


4 Answers

Look at this mask

In [88]: df.values != 1
Out[88]:
array([[ True,  True,  True, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

Slicing base on numpy convention. Every True will be repeated, so you have repeated rows in the output. You need additional all to check each row on all True and return a single True/False for each row.

df[(df.values != 1).all(1)]

Out[87]:
   c1  c2  c3  c4
2   2   3 NaN   3

Note: my intention is reusing your code, so I didn't change it. While the concise code should be.

df[(df != 1).all(1)]

or

df[df.ne(1).all(1)]
like image 143
Andy L. Avatar answered Nov 14 '22 21:11

Andy L.


Use DataFrame.any:

df[~df.eq(1).any(axis=1)]

Output:

   c1  c2  c3  c4
2   2   3 NaN   3
like image 30
ansev Avatar answered Nov 14 '22 21:11

ansev


try this:

indexes = [x for x in range(len(list(df.values))) if 1 not in df.values[x]] # get indexes where 1 not appear
df.iloc[indexes]

output:

    c1  c2  c3  c4
2   2   3   NaN 3
like image 37
Alex Avatar answered Nov 14 '22 22:11

Alex


Basically, use the filter to create an index, reverse the index, and then select the rows based off that index.

import pandas as pd    

df = pd.DataFrame({'c1': [3, 1, 2,   1, 3], 
                   'c2': [3, 3, 3,   2, 3], 
                   'c3': [2, 5, None,3, 3], 
                   'c4': [1, 2, 3,   1, 3]})
print(df)
# Create an index based on any row containing 1
index = df.values == 1
print(index)
# This reverses the index. 
# I.e. 
#[False False False  True] will equal False, since True is in the list
#[True False False False]  will equal False, since True is in the list
#[False False False False] will equal True,  since True is NOT in the list
index = [True if True not in l else False for l in index]
# Pick out only the rows where the index is true
df1 = df[index]
print(df1)
like image 22
RightmireM Avatar answered Nov 14 '22 22:11

RightmireM