Merging two dataframes with pd.NA in merge column yields 'TypeError: boolean value of NA is ambiguous'

Question

With Pandas 1.0.1, I'm unable to merge if the

df = df.merge(df2, on=some_column)

yields

File /home/torstein/code/fintechdb/Sheets/sheets/gild.py, line 42, in gild
    df = df.merge(df2, on=some_column)
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py, line 7297, in merge
    validate=validate,
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 88, in merge
    return op.get_result()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 643, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 862, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 841, in _get_join_indexers
    self.left_join_keys, self.right_join_keys, sort=self.sort, how=self.how
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1311, in _get_join_indexers
    zipped = zip(*mapped)
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1309, in <genexpr>
    for n in range(len(left_keys))
File /home/torstein/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py, line 1918, in _factorize_keys
    rlab = rizer.factorize(rk)
File pandas/_libs/hashtable.pyx, line 77, in pandas._libs.hashtable.Factorizer.factorize
File pandas/_libs/hashtable_class_helper.pxi, line 1817, in pandas._libs.hashtable.PyObjectHashTable.get_labels
File pandas/_libs/hashtable_class_helper.pxi, line 1732, in pandas._libs.hashtable.PyObjectHashTable._unique
File pandas/_libs/missing.pyx, line 360, in pandas._libs.missing.NAType.__bool__

TypeError: boolean value of NA is ambiguous

while this works:

df[some_column].fillna(np.nan, inplace=True)
df2[some_column].fillna(np.nan, inplace=True)
df = df.merge(df2, on=some_column)
# Works

If instead, I do

df[some_column].fillna(pd.NA, inplace=True)

then the error returns.

Celius Stingher · Accepted Answer

This has to do with pd.NA being implemented in pandas 1.0.0 and how the pandas team decided it should work in a boolean context. Also, you take into account it is an experimental feature, hence it shouldn't be used for anything but experimenting:

Warning Experimental: the behaviour of pd.NA can still change without warning.

In another link of pandas documentation, where it covers working with missing values, is where I believe the reason and the answer you are looking for can be found:

NA in a boolean context: Since the actual value of an NA is unknown, it is ambiguous to convert NA to a boolean value. The following raises an error: TypeError: boolean value of NA is ambiguous

Furthermore, it provides a valuable piece of advise:

"This also means that pd.NA cannot be used in a context where it is evaluated to a boolean, such as if condition: ... where condition can potentially be pd.NA. In such cases, isna() can be used to check for pd.NA or condition being pd.NA can be avoided, for example by filling missing values beforehand."

autonopy · Answer

I decided that the pd.NA instances in my data were valid, and hence I needed to deal with them rather than filling them, like with fillna(). If you're like me in this case, then convert it from pd.NA to either True or False by simply using pd.isna(val). Only you can decide whether the null should come out T or F, but here's a simple example:

val = pd.NA
if pd.isna(val) :
    print('it is null')
else :
    print('it is not null')

returns: it is null

Then,

val = 7
if pd.isna(val) :
    print('it is null')
else :
    print('it is not null')

returns: it is not null

Hope this helps other trying to get a definitive course of action (Celius's answer is accurate, but I wanted to provide actionable code for those struggling with this).

Merging two dataframes with pd.NA in merge column yields 'TypeError: boolean value of NA is ambiguous'

Tags:

python

python-3.x

pandas

tsorn

2 Answers

Celius Stingher

autonopy

Recent Activity

Donate For Us

Merging two dataframes with pd.NA in merge column yields 'TypeError: boolean value of NA is ambiguous'

Tags:

python

python-3.x

pandas

tsorn

2 Answers

Celius Stingher

autonopy

Related questions

Recent Activity

Donate For Us