In a pandas DataFrame, I have a series of boolean values. In order to filter to rows where the boolean is True, I can use: df[df.column_x]
I thought in order to filter to only rows where the column is False, I could use: df[~df.column_x]
. I feel like I have done this before, and have seen it as the accepted answer.
However, this fails because ~df.column_x
converts the values to integers. See below.
import pandas as pd . # version 0.24.2
a = pd.Series(['a', 'a', 'a', 'a', 'b', 'a', 'b', 'b', 'b', 'b'])
b = pd.Series([True, True, True, True, True, False, False, False, False, False], dtype=bool)
c = pd.DataFrame(data=[a, b]).T
c.columns = ['Classification', 'Boolean']```
print(~c.Boolean)
0 -2
1 -2
2 -2
3 -2
4 -2
5 -1
6 -1
7 -1
8 -1
9 -1
Name: Boolean, dtype: object
print(~b)
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
9 True
dtype: bool
Basically, I can use c[~b]
, but not c[~c.Boolean]
Am I just dreaming that this use to work?
Numbers can be used as bool values by using Python's built-in bool() method. Any integer, floating-point number, or complex number having zero as a value is considered as False, while if they are having value as any positive or negative number then it is considered as True.
In Python, the two Boolean values are True and False (the capitalization must be exactly as shown), and the Python type is bool. In the first statement, the two operands evaluate to equal values, so the expression evaluates to True; in the second statement, 5 is not equal to 6, so we get False.
In Python, boolean variables are defined by the True and False keywords. The output <class 'bool'> indicates the variable is a boolean data type. Note the keywords True and False must have an Upper Case first letter. Using a lowercase true returns an error.
Ah , since you created the c
by using DataFrame
constructor , then T
,
1st let us look at what we have before T
:
pd.DataFrame([a, b])
Out[610]:
0 1 2 3 4 5 6 7 8 9
0 a a a a b a b b b b
1 True True True True True False False False False False
So pandas
will make each columns only have one dtype
, if not it will convert to object
.
After T
what data type we have for each columns
The dtypes
in your c
:
c.dtypes
Out[608]:
Classification object
Boolean object
Boolean
columns
became object
type , that is why you get unexpected output for ~c.Boolean
How to fix it ? ---concat
c=pd.concat([a,b],1)
c.columns = ['Classification', 'Boolean']
~c.Boolean
Out[616]:
0 False
1 False
2 False
3 False
4 False
5 True
6 True
7 True
8 True
9 True
Name: Boolean, dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With