Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Instability of pandas dataframe calculations

I'm wondering whether anyone has seen this problem with Pandas before. Basically, I'm trying to add, multiply, and divide DataFrames element-by-element (all the frames have identical indexes and columns), but Pandas is spitting out different results for the same calculation performed successively.

An image of some example output is shown below. I've used .values in the code below because for display purposes, but the instability can happen when using .add(), .mul(), or .div(). For example, if I repeatedly enter N11.add(N00), I usually get the correct answer, but occasionally (every 4th or 5th time), I get a DataFrame filled with 0s.

enter image description here

If it matters, I'm on Windows 10 using an Anaconda distribution of Pandas 0.17.0 (with Python 2.7.10 on Spyder 2.3.7). The frames that I am working with are large (6856 by 12511). Has anyone else encountered this problem? Is this a known issue or am I doing something wrong?

like image 364
Arisdawdle Avatar asked Oct 19 '22 22:10

Arisdawdle


1 Answers

I encountered a similar issue today and it was caused by a bug in numexpr 2.4.4. It seems to be biting other pandas users in various ways, as reported in this pandas ticket and others linked to it.

Upgrading numexpr to 2.4.6 solved the problem for us, but it looks like any version that's not 2.4.4 should be fine!

like image 110
mactyr Avatar answered Oct 22 '22 11:10

mactyr