Extension dtypes in pandas appear to have a bug with query

Question

(2/19/2019): I opened up a report in the numexpr tracker: https://github.com/pydata/numexpr/issues/331

The pandas report is: https://github.com/pandas-dev/pandas/issues/25369

Unless I'm doing something I'm not supposed to, the new dtype extensions for nullable int appear to have a bug with the QUERY method on dataframe (the problem seems to be in the numexpr package):

df_test = pd.DataFrame(data=[4,5,6], columns=["col_test"])
df_test = df_test.astype(dtype={"col_test": pd.Int32Dtype()})
df_test.query("col_test != 6")

Last lines of the long error message are:

File "...\site_packages umexpr ecompiler.py", line 822, in evaluate zip(names, arguments)] File "...\site_packages umexpr ecompiler.py", line 821, in signature = [(name, getType(arg)) for (name, arg) in File "...\site_packages umexpr ecompiler.py", line 703, in getType raise ValueError("unknown type %s" % a.dtype.name) ValueError: unknown type object

The non-extension dtypes work fine:

df_test = df_test.astype(dtype={"col_test": np.int32})
df_test.query("col_test != 6")

(p.s. as an entirely separate issue, passing the dtype to the pd.DataFrame constructor directly doesn't work--seems buggy).

Thanks.

cs95 · Accepted Answer

Extension dtypes have been introduced for the first time in 0.24, and there are a lot of kinks to iron out.

That said, this seems to be some kind of compatibility issue between numexpr and pandas. This definitely looks buggy, and until it is fixed, we will have to fall back to the 'python' engine.

df_test.query('col_test != 6', engine='python')

   col_test
0         4
1         5

(More information on query/eval: Dynamic Expression Evaluation in pandas using pd.eval())

Notwithstanding the fact that you could just do

df_test.loc[df_test['col_test'] != 6]

   col_test
0         4
1         5

Which is likely to be a lot faster (using engine='python' offers no performance benefits over loc).

Extension dtypes in pandas appear to have a bug with query

Tags:

pandas

techvslife

1 Answers

cs95

Recent Activity

Donate For Us

Extension dtypes in pandas appear to have a bug with query

Tags:

pandas

techvslife

1 Answers

cs95

Related questions

Recent Activity

Donate For Us