I was trying to run a code snippet which looks like,
import numpy as np
import time
def estimate_mutual_info(X, neurons, bins = 5):
xy = np.histogram2d(X, neurons, bins)[0]
x = np.histogram(X, bins)[0]
y = np.histogram(neurons, bins)[0]
ent_x = -1 * np.sum( x / np.sum(x) * np.log( x / np.sum(x)))
ent_y = -1 * np.sum( y / np.sum(y) * np.log( y / np.sum(y)))
ent_xy = -1 * np.sum( xy / np.sum(xy) * np.log( xy / np.sum(xy)))
return (ent_x + ent_y - ent_xy)
tic = time.time()
X = np.random.rand(12000, 1200)
Y = np.random.rand(12000, 10)
for j in Y.T:
mi = 0
for i in range(X.shape[1]):
mi += estimate_mutual_info(X.T[i], j, bins = 2)
print(mi)
toc = time.time()
print(str(toc - tic)+" seconds")
To increase the speed, I used float16
, hoping to see some improvement, but float16
was much slower than float32
and float64
.
X = np.random.rand(12000, 1200).astype('float16')
Y = np.random.rand(12000, 10).astype('float16')
changing them to float16
results in execution time of 84.57 seconds
, whereas float64
and float32
executed for 36.27 seconds
and 33.25 seconds
respectively. I am not sure, what causes this poor performance for flaot16
. My processor is 64 bit
, using python3.7
and numpy-1.16.2
. I don't think 64 bit processor treats all 16 bit, 32 bit and 64 bit indifferent. Any correction and insight is much appreciated.
Efficient training of modern neural networks often relies on using lower precision data types. Peak float16 matrix multiplication and convolution performance is 16x faster than peak float32 performance on A100 GPUs.
float64 is 10 times slower than float in arithmetic calculations. It's so bad that even converting to float and back before the calculations makes the program run 3 times faster.
At least on intel, float64 should be faster than float32 since all math is done on the fpu in 64 bits, so it needs to be converted, but the memory bus also comes into play.
Save this question.
The most likely explanation is that your processor does not natively support FP16 arithmetic, so it's all being done in software, which is, of course, much slower.
In general, consumer Intel processors don't support FP16 operations.
it is happening because there is no equivalent of float16 in c.
since python is based on c, as there is no equivalent of that in c, numpy created a method to perform for float16.
( float is a 32 bit IEEE 754 single precision Floating Point Number1 bit for the sign, (8 bits for the exponent, and 23* for the value), i.e. float has 7 decimal digits of precision)
Becacause of this( process making equivalent to work on float16) float16
is slower than float32
or float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With