Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I can't compare dataframe to a string! But I can compare its transpose

Consider the dataframe df

df = pd.DataFrame({
    1: [1, 2],
    2: ['a', 3],
    3: [None, 7]
})

df

   1  2    3
0  1  a  NaN
1  2  3  7.0

When I compare with a string

df == 'a'
TypeError: Could not compare ['a'] with block values

However, taking the transpose fixes the problem?!

(df.T == 'a').T

       1      2      3
0  False   True  False
1  False  False  False

What is this error? Is it something I can fix with how I'm constructing my dataframe? What is different about comparing to the transpose?

like image 936
piRSquared Avatar asked Jul 27 '17 17:07

piRSquared


1 Answers

When creating your data frame, declare dtype=object:

In [1013]: df = pd.DataFrame({
      ...:     1: [1, 2],
      ...:     2: ['a', 3],
      ...:     3: [None, 7]
      ...: }, dtype=object)

In [1014]: df
Out[1014]: 
   1  2     3
0  1  a  None
1  2  3     7

Now, you can compare without transposition:

In [1015]: df == 'a'
Out[1015]: 
       1      2      3
0  False   True  False
1  False  False  False

My belief is that to begin with, your columns aren't objects (they're coerced wherever possible) but transposition forces the change because of the mixed values.


Found this in the source code pandas/internals.py:

if not isinstance(result, np.ndarray):
    # differentiate between an invalid ndarray-ndarray comparison
    # and an invalid type comparison
    ...
    raise TypeError('Could not compare [%s] with block values' %
                    repr(other))

If the item being compared does not match the dtype of the array, this error is thrown.

like image 121
cs95 Avatar answered Nov 02 '22 20:11

cs95