The standard is float32 but I'm wondering under what conditions it's ok to use float16? I've compared running the same covnet with both datatypes and haven't noticed any issues. With large dataset I prefer float16 because I can worry less about memory issues..

Surprisingly, it's totally OK to use 16 bits, even not just for fun, but in production as well. For example, in this video Jeff Dean talks about 16-bit calculations at Google, around 52:00. A quote from the slides: <blockquote> Neural net training very tolerant of reduced precision </blockquote> Since GPU memory is the main bottleneck in ML computation, there has been a lot of research on precision reduction. E.g. <ul> <li>Gupta at al paper "Deep Learning with Limited Numerical Precision" about fixed (not floating) 16-bit training but with stochastic rounding. </li> <li>Courbariaux at al "Training Deep Neural Networks with Low Precision Multiplications" about 10-bit activations and 12-bit parameter updates. </li> <li>And this is not the limit. Courbariaux et al, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1". Here they discuss 1-bit activations and weights (though higher precision for the gradients), which makes the forward pass super fast.</li> </ul> Of course, I can imagine some networks may require high precision for training, but I would recommend at least to try 16 bits for training a big network and switch to 32 bits if it proves to work worse.

float16 vs float32 for convolutional neural networks

1 Answers

Surprisingly, it's totally OK to use 16 bits, even not just for fun, but in production as well. For example, in this video Jeff Dean talks about 16-bit calculations at Google, around 52:00. A quote from the slides:

Neural net training very tolerant of reduced precision

Since GPU memory is the main bottleneck in ML computation, there has been a lot of research on precision reduction. E.g.

Gupta at al paper "Deep Learning with Limited Numerical Precision" about fixed (not floating) 16-bit training but with stochastic rounding.
Courbariaux at al "Training Deep Neural Networks with Low Precision Multiplications" about 10-bit activations and 12-bit parameter updates.
And this is not the limit. Courbariaux et al, "BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1". Here they discuss 1-bit activations and weights (though higher precision for the gradients), which makes the forward pass super fast.

Of course, I can imagine some networks may require high precision for training, but I would recommend at least to try 16 bits for training a big network and switch to 32 bits if it proves to work worse.

159

answered Sep 20 '22 01:09

Maxim

Related questions
                            
                                How to fix AttributeError: module 'numpy' has no attribute 'square' [closed]
                            
                                SQL join or R's merge() function in NumPy?
                            
                                python numpy and memory efficiency (pass by reference vs. value)
                            
                                Merging time series data by timestamp using numpy/pandas
                            
                                Convert elements in a numpy array to string
                            
                                numpy.product vs numpy.prod vs ndarray.prod
                            
                                How can I make a FloatTensor with requires_grad=True from a numpy array using PyTorch 0.4.0?
                            
                                Writing a faster Python physics simulator
                            
                                Numpy: Difference between a[i][j] and a[i,j]
                            
                                Is freeing handled differently for small/large numpy arrays?
                            
                                Save numpy array to CSV without scientific notation
                            
                                Multiplying a np.int8 array with 127 yields different numpy array types depending on platform
                            
                                how to convert numpy array to keras tensor
                            
                                Named dtype array: Difference between a[0]['name'] and a['name'][0]?
                            
                                Numpy & Pandas: Return histogram values from pandas histogram plot?
                            
                                How to deal with PyCharm's "Expected type X, got Y instead"
                            
                                Am I creating lossless PNG images?
                            
                                How to apply linear regression to every pixel in a large multi-dimensional array containing NaNs?
                            
                                Understanding scipy's least square function with IRLS
                            
                                Passing arguments to fsolve

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

float16 vs float32 for convolutional neural networks

Tags:

neural-network

numpy

tensorflow

keras

conv-neural-network

megashigger

People also ask

1 Answers

Maxim

Recent Activity

Donate For Us