Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

deep neural network's precision for image recognition, float or double?

Neural networks for image recognition can be really big. There can be thousands of inputs/hidden neurons, millions of connections what can take up a lot of computer resources.

While float being commonly 32bit and double 64bit in c++, they don't have much performance difference in speed yet using floats can save up some memory.

Having a neural network what is using sigmoid as an activation function, if we could choose of which variables in neural network can be float or double which could be float to save up memory without making neural network unable to perform?

While inputs and outputs for training/test data can definitely be floats because they do not require double precision since colors in image can only be in range of 0-255 and when normalized 0.0-1.0 scale, unit value would be 1 / 255 = 0.0039~

1. what about hidden neurons output precision, would it be safe to make them float too?

hidden neuron's output gets it value from the sum of previous layer neuron's output * its connection weight to currently being calculating neuron and then sum being passed into activation function(currently sigmoid) to get the new output. Sum variable itself could be double since it could become a really large number when network is big.

enter image description here

2. what about connection weights, could they be floats?

while inputs and neuron's outputs are at the range of 0-1.0 because of sigmoid, weights are allowed to be bigger than that.


Stochastic gradient descent backpropagation suffers on vanishing gradient problem because of the activation function's derivative, I decided not to put this out as an a question of what precision should gradient variable be, feeling that float will simply not be precise enough, specially when network is deep.

like image 480
Aiden Anomaly Avatar asked Nov 10 '16 21:11

Aiden Anomaly


People also ask

Which neural network is best for image recognition?

1. Convolutional Neural Networks (CNNs) CNN's, also known as ConvNets, consist of multiple layers and are mainly used for image processing and object detection.

What is the best algorithm for image recognition?

CNN is a powerful algorithm for image processing. These algorithms are currently the best algorithms we have for the automated processing of images. Many companies use these algorithms to do things like identifying the objects in an image. Images contain data of RGB combination.

Why is deep learning better for image classification?

In CNNs, the nodes in the hidden layers don't always share their output with every node in the next layer (known as convolutional layers). Deep learning allows machines to identify and extract features from images. This means they can learn the features to look for in images by analysing lots of pictures.

Why are neural networks good for image recognition?

A Convolutional Neural Network is a special class of neural networks that are built with the ability to extract unique features from image data. For instance, they are used in face detection and recognition because they can identify complex features in image data.


2 Answers

  1. what about hidden neurons output precision, would it be safe to make them float too?

Using float32 everywhere is usually the safe first choice for most of the neural network applications. GPUs currently support only float32, so many practitioners stick to float32 everywhere. For many applications, even 16-bit floating point values could be sufficient. Some extreme examples show that high accuracy networks can be trained with only as little as 2-bits per weight (https://arxiv.org/abs/1610.00324).

The complexity of the deep networks is usually limited not by the computation time, but by the amount of RAM on a single GPU and throughput of the memory bus. Even if you're working on CPU, using a smaller data type still helps to use the cache more efficiently. You're rarely limited by the machine datatype precision.

since colors in image can only be in range of 0-255,

You're doing it wrong. You force the network to learn the scale of your input data, when it is already known (unless you're using a custom weight initialization procedure). The better results are usually achieved when the input data is normalized to the range (-1, 1) or (0, 1) and the weights are initialized to have the average output of the layer at the same scale. This is a popular initialization technique: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

If inputs are in the range [0, 255], then with an average input being ~ 100, and weights being ~ 1, the activation potential (the argument of the activation function) is going to be ~ 100×N, where N is the number of layer inputs, likely far away in the "flat" part of the sigmoid. So either you initialize your weights to be ~ 1/(100×N), or you scale your data and use any popular initialization method. Otherwise the network will have to spend a lot of training time just to bring the weights to this scale.

Stochastic gradient descent backpropagation suffers on vanishing gradient problem because of the activation function's derivative, I decided not to put this out as an a question of what precision should gradient variable be, feeling that float will simply not be precise enough, specially when network is deep.

It's much less a matter of machine arithmetic precision, but the scale of the outputs for each of the layers. In practice:

  • preprocess input data (normalize to (-1, 1) range)
  • if you have more than 2 layers, then don't use sigmoids, use rectified linear units instead
  • initialize weights carefully
  • use batch normalization

This video should be helpful to learn these concepts if you're not familiar with them.

like image 187
sastanin Avatar answered Sep 20 '22 12:09

sastanin


From least amount of bits needed for single neuron:

The following papers have studied this question (descending chronological order):

  • Accelerating Deep Convolutional Networks using low-precision and sparsity. Ganesh Venkatesh, Eriko Nurvitadhi, Debbie Marr. 2016-10-02. https://arxiv.org/abs/1610.00324
  • Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1 Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio arxiv: http://arxiv.org/abs/1602.02830
  • Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan Deep Learning with Limited Numerical Precision https://arxiv.org/abs/1502.02551
  • Courbariaux, Matthieu, Jean-Pierre David, and Yoshua Bengio. "Training deep neural networks with low precision multiplications." arXiv preprint arXiv:1412.7024 (2014). https://arxiv.org/abs/1412.7024
  • Vanhoucke, Vincent, Andrew Senior, and Mark Z. Mao. "Improving the speed of neural networks on CPUs." (2011). https://scholar.google.com/scholar?cluster=14667574137314459294&hl=en&as_sdt=0,22 ; https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf

Example from Deep Learning with Limited Numerical Precision:

enter image description here

like image 36
Franck Dernoncourt Avatar answered Sep 18 '22 12:09

Franck Dernoncourt