How does the Pandas deal with the situation when a column with type "object" is compared with an integer?

Tags:

My question is about the rule that pandas uses to compare a column with type "object" with an integer. Here is my code:

In [334]: df
Out[334]: 
     c1    c2        c3  c4
id1   1    li -0.367860   5
id2   2  zhao -0.596926   5
id3   3   sun  0.493806   5
id4   4  wang -0.311407   5
id5   5  wang  0.253646   5

In [335]: df < 2
Out[335]: 
        c1    c2    c3     c4
id1   True  True  True  False
id2  False  True  True  False
id3  False  True  True  False
id4  False  True  True  False
id5  False  True  True  False

In [336]: df.dtypes
Out[336]: 
c1      int64
c2     object
c3    float64
c4      int64
dtype: object

Why does the "c2" column get True for all?

P.S. I also tried:

In [333]: np.less(np.array(["s","b"]),2)
Out[333]: NotImplemented

238

asked Aug 18 '18 12:08

BO.LI

1 Answers

For DataFrames, comparison with a scalar always returns a DataFrame having all Boolean columns.

I don't think it's documented anywhere officially, but there's a comment in the source code (see below) confirming the intended behaviour:

[for] straight boolean comparisons [between a DataFrame and a scalar] we want to allow all columns (regardless of dtype to pass thru) See #4537 for discussion.

In practice, this means that all comparisons for every column must return either True or False. Any invalid comparison (such as 'li' < 2) should default to one of these Boolean values.

Put simply, the pandas developers decided that it should default to True.

There's some discussion of this behaviour in #4537 and some argument to use False instead, or restrict the comparison to only columns with compatible types, but the ticket was closed and no code was changed.

If you're interested, you can see where the default value is used for invalid comparisons in an internal method found in ops.py:

def _comp_method_FRAME(cls, func, special):
    str_rep = _get_opstr(func, cls)
    op_name = _get_op_name(func, special)

    @Appender('Wrapper for comparison method {name}'.format(name=op_name))
    def f(self, other):
        if isinstance(other, ABCDataFrame):
            # Another DataFrame
            if not self._indexed_same(other):
                raise ValueError('Can only compare identically-labeled '
                                 'DataFrame objects')
            return self._compare_frame(other, func, str_rep)

        elif isinstance(other, ABCSeries):
            return _combine_series_frame(self, other, func,
                                         fill_value=None, axis=None,
                                         level=None, try_cast=False)
        else:

            # straight boolean comparisons we want to allow all columns
            # (regardless of dtype to pass thru) See #4537 for discussion.
            res = self._combine_const(other, func,
                                      errors='ignore',
                                      try_cast=False)
            return res.fillna(True).astype(bool)

    f.__name__ = op_name
    return f

The else block is the one we're interested in for the scalar case.

Note the errors='ignore' argument, meaning an invalid comparison will return NaN (instead of raising an error). The res.fillna(True) fills these failed comparisons with True.

143

answered Nov 02 '22 08:11

Alex Riley

Related questions
                            
                                IOError: [Errno 13] Permission denied: 'geckodriver.log when running Python/Selenium
                            
                                Is there a way to prevent SMTP Connection Timeout? smtplib, python
                            
                                Why does this query give different results depending on how I arrange my DateTime arithmetic?
                            
                                Simple hash of PIL image
                            
                                Django SearchVector using icontains
                            
                                How to manage two pip versions in conda?
                            
                                Numpy find indices of groups with same value
                            
                                Tensorflow hashtable lookup with arrays
                            
                                Merging pandas dataframes on 2 columns but in either order
                            
                                Python - isinstance() not working as I'd expect
                            
                                what does it mean by 'passed by assignment'?
                            
                                Add a signature, with annotations, to extension methods
                            
                                Write pandas dataframe to Excel with xlsxwriter and include `write_rich_string` formatting
                            
                                How to document the post body using flask-ReSTplus?
                            
                                Normalizing FFT spectrum magnitude to 0dB
                            
                                How to show numpy 2d array as grayscale image in Jupyter Notebook? [duplicate]
                            
                                Is adding project root directory to sys.path a good practice?
                            
                                Pipenv global environment
                            
                                Why does "_" not always give me the last result in interactive shell
                            
                                Swapping list elements in python where the expressions contain function calls

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the Pandas deal with the situation when a column with type "object" is compared with an integer?

Tags:

python

pandas

dataframe

comparison-operators

BO.LI

People also ask

1 Answers

Alex Riley

Recent Activity

Donate For Us