I need to calculate the mean in columns of an array with more than 1000 rows.
np.mean(some_array)
gives me
inf
as output
but i am pretty sure the values are ok. I am loading a csv from here into my Data
variable and column 'cement' is "healthy" from my point of view.
In[254]:np.mean(Data[:230]['Cement'])
Out[254]:275.75
but if I increase the number of rows the problem starts:
In [259]:np.mean(Data[:237]['Cement'])
Out[259]:inf
but when i look at the Data
In [261]:Data[230:237]['Cement']
Out[261]:
array([[ 425. ],
[ 333. ],
[ 250.25],
[ 491. ],
[ 160. ],
[ 229.75],
[ 338. ]], dtype=float16)
i do not find a reason for this behaviour P.S This happens in Python 3.x using wakari (cloud based Ipython)
Numpy Version '1.8.1'
I am loading the Data with:
No_Col=9
conv = lambda valstr: float(valstr.replace(',','.'))
c={}
for i in range(0,No_Col,1):
c[i] = conv
Data=np.genfromtxt(get_data,dtype=float16 , delimiter='\t', skip_header=0, names=True, converters=c)
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
NumPy Introduction NumPy is a Python library used for working with arrays. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely. NumPy stands for Numerical Python.
mean() in Python. The sum of elements, along with an axis divided by the number of elements, is known as arithmetic mean. The numpy. mean() function is used to compute the arithmetic mean along the specified axis.
nanmean() function can be used to calculate the mean of array ignoring the NaN value. If array have NaN value and we can find out the mean without effect of NaN value. axis: we can use axis=1 means row wise or axis=0 means column wise.
I will guess that the problem is precision (as others have also commented). Quoting directly from the documentation for mean()
we see
Notes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for
float32
(see example below). Specifying a higher-precision accumulator using thedtype
keyword can alleviate this issue.
Since your array is of type float16 you have very limited precision. Using dtype=np.float64
will probably alleviate the overflow. Also see the examples in the mean()
documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With