I'm working on numerical simulations. I ran into an issue with large NumPy arrays (~ 26 GB) on Linux with 128 GB of RAM. The arrays are of type complex128.
I query the minimum and maximum values of the real and imaginary parts:
import numpy as np
import numba
@numba.jit(cache=True)
def minmaxrealimag(x):
rmaximum = x[0].real
rminimum = x[0].real
imaximum = x[0].imag
iminimum = x[0].imag
for i in x[1:]:
if i.real > rmaximum:
rmaximum = i.real
elif i.real < rminimum:
rminimum = i.real
if i.imag > imaximum:
imaximum = i.imag
elif i.imag < iminimum:
iminimum = i.imag
return (rminimum, rmaximum, iminimum, imaximum)
testn = 2048
testx = 1024
field = np.empty((testx, testx), dtype=complex)
field.real[:] = np.random.rand(*field.shape)[:]
field.imag[:] = np.random.rand(*field.shape)[:]
profile = np.empty(testn, dtype=complex)
profile.real[:] = np.random.rand(*profile.shape)[:]
profile.imag[:] = np.random.rand(*profile.shape)[:]
correctslice = field[:, :] * profile[0]
(rmin, rmax, imin, imax) = minmaxrealimag(correctslice.flatten())
test1 = np.empty((testn, testx, testx), dtype=complex)
for itau in range(testn):
correctslice = field[:, :] * profile[itau]
(rmin2, rmax2, imin2, imax2) = minmaxrealimag(correctslice.flatten())
if rmin2 < rmin:
rmin = rmin2
elif rmax2 > rmax:
rmax = rmax2
if imin2 < imin:
imin = imin2
elif imax2 > imax:
imax = imax2
test1[itau] = correctslice[:, :]
print((rmin, rmax, imin, imax))
print(minmaxrealimag(test1.flatten()))
Simpler example (slower and less informative):
import numpy as np
testn = 2048
testx = 1024
field = np.empty((testn, testx, testx), dtype=complex)
field.real[:] = np.random.rand(*field.shape)[:]
field.imag[:] = np.random.rand(*field.shape)[:]
print(np.max(field.imag))
Sometimes everything goes fine, but usually the minimum and maximum of the imaginary parts are incorrect. For these examples the minimum is reasonable but wrong and the maximum is either nan (very rare) or (more likely) close to the double-precision floating point maximum, though a factor of two or four smaller. Sometimes the maximum has an exponent that is a factor of two or four smaller than the float64 max. I've never seen it be Inf. From this I assume that this weird maximum value in the imaginary part has the following properties:
When I try to locate and explicitly assign a value (say, 1) to the offending pixel, the value remains unchanged.
The issue isn't always replicable in these snippets, but it's consistently stalling my simulations that use multiple arrays of this size and frequently occupies >80% of system memory. The frustrating part is the lack of an error, segfault, or exception, or else I'd think this was a memory-safety issue. Not even a warning. I have no way to know if a simulation is going to work or going off the rails.
The obvious answer is to move to a distributed-memory model of parallelization but I first want to know that the issue isn't on my end, aside from asking too much of my computer. I haven't tried this on a different computer as none of my other have as much memory. However, the computer in question is behaving normally in every other way.
So yeah, seems like it's a memory issue, thanks to @KellyBundy for the suggestion that I run MemTest86.
There were so many errors in the test that it just gave up when it hit 100000, which is odd, because the system boots just fine and I've never had a problem with crashes (hence why I didn't immediately suspect a hardware problem). Even the simulations run fine (usually) until they reach a certain size. But the memory test was showing a multitude of single-bit errors, always in the first two bytes. I'm not that experienced with this kind of problem, but I tested each of the four modules in each of the four DIMM slots individually, and they all failed all the time, so I think it's probably either a PSU problem or a bad memory controller on the CPU, but until I can find a known-good PSU to swap in, I won't know which (I don't have access to a PSU tester). For reference, there's 128GB of non-ECC UDIMM, which in hindsight may have been a little ambitious. The CPU is a Ryzen 9 3900X.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With