Bool and missing values in pandas

Tags:

I am trying to figure out whether or not a column in a pandas dataframe is boolean or not (and if so, if it has missing values and so on).

In order to test the function that I created I tried to create a dataframe with a boolean column with missing values. However, I would say that missing values are handled exclusively 'untyped' in python and there are some weird behaviours:

> boolean = pd.Series([True, False, None])
> print(boolean)

0     True
1    False
2     None
dtype: object

so the moment you put None into the list, it is being regarded as object because python is not able to mix the types bool and type(None)=NoneType back into bool. The same thing happens with math.nan and numpy.nan. The weirdest things happen when you try to force pandas into an area it does not want to go to :-)

> boolean = pd.Series([True, False, np.nan]).astype(bool)
> print(boolean)
0     True
1    False
2     True
dtype: bool

So 'np.nan' is being casted to 'True'?

Questions:

Given a data table where one column is of type 'object' but in fact it is a boolean column with missing values: how do I figure that out? After filtering for the non-missing values it is still of type 'object'... do I need to implement a try-catch-cast of every column into every imaginable data type in order to see the true nature of columns?
I guess that there is a logical explanation of why np.nan is being casted to True but this is an unwanted behaviour of the software pandas/python itself, right? So should I file a bug report?

615

asked Aug 28 '19 13:08

Fabian Werner

1 Answers

Q1: I would start with combining

np.any(pd.isna(boolean))

to identify if there are any None Values in a column, and with

set(boolean)

You can identify, if there are only True, False and Nones inside. Combining with filtering (and if you prefer to also typcasting) you should be done.

Q2: see comment of @WeNYoBen

answered Oct 31 '22 18:10

Sosel

Related questions
                            
                                RuntimeError: _thnn_mse_loss_forward is not implemented for type torch.cuda.LongTensor
                            
                                Regex to Match mRNA Sequences
                            
                                ImportError: cannot import name '_counter' from 'Crypto.Util'
                            
                                Cannot import PyOpenCL in Juypter Notebook
                            
                                In Python, why do warnings not appear when using `eval`?
                            
                                Splitting a list with strings and nested lists of strings into a flat list
                            
                                Why is substring searching using 'in' operator, faster than using KMP algorithm?
                            
                                PyLaTeX: pylatex.errors.CompilerError: No LaTex compiler was found
                            
                                Getting Python package distribution version from within a package
                            
                                Using Panda's .at function to modify multiple rows
                            
                                Python pytest pytest_exception_interact customize exception information from VCR.py exception
                            
                                How to hide command prompt popup during launching PyLatex or Latexmk
                            
                                How to document options in an INI file with Sphinx
                            
                                Recommendation system with matrix factorization for huge data gives MemoryError
                            
                                How to provide multiple targets to a Seq2Seq model?
                            
                                RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment
                            
                                How do you enable macOS Dark Mode in PyQt5 (5.13)
                            
                                Generating correlated random potential using fast Fourier transform
                            
                                Using numerical values in plotly for creating Gantt-Charts

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bool and missing values in pandas

Tags:

python

pandas

dataframe

Fabian Werner

People also ask

1 Answers

Sosel

Recent Activity

Donate For Us