Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Float16 slower than float32 in keras

I'm testing out my new NVIDIA Titan V, which supports float16 operations. I noticed that during training, float16 is much slower (~800 ms/step) than float32 (~500 ms/step).

To do float16 operations, I changed my keras.json file to:

{
"backend": "tensorflow",
"floatx": "float16",
"image_data_format": "channels_last",
"epsilon": 1e-07
}

Why are the float16 operations so much slower? Do I need to make modifications to my code and not just the keras.json file?

I am using CUDA 9.0, cuDNN 7.0, tensorflow 1.7.0, and keras 2.1.5 on Windows 10. My python 3.5 code is below:

img_width, img_height = 336, 224

train_data_dir = 'C:\\my_dir\\train'
test_data_dir = 'C:\\my_dir\\test'
batch_size=128

datagen = ImageDataGenerator(rescale=1./255,
    horizontal_flip=True,   # randomly flip the images 
    vertical_flip=True) 

train_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

test_generator = datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

# Architecture of NN
model = Sequential()
model.add(Conv2D(32,(3, 3), input_shape=(img_height, img_width, 3),padding='same',kernel_initializer='lecun_normal'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(AveragePooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('sigmoid'))

my_rmsprop = keras.optimizers.RMSprop(lr=0.0001, rho=0.9, epsilon=1e-04, decay=0.0)
model.compile(loss='binary_crossentropy',
          optimizer=my_rmsprop,
          metrics=['accuracy'])

# Training 
nb_epoch = 32
nb_train_samples = 512
nb_test_samples = 512

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples/batch_size,
    epochs=nb_epoch,
    verbose=1,
    validation_data=test_generator,
    validation_steps=nb_test_samples/batch_size)

# Evaluating on the testing set
model.evaluate_generator(test_generator, nb_test_samples)
like image 928
Mark Avatar asked Apr 11 '18 18:04

Mark


People also ask

Is float16 faster than float32?

Efficient training of modern neural networks often relies on using lower precision data types. Peak float16 matrix multiplication and convolution performance is 16x faster than peak float32 performance on A100 GPUs.

Is float16 faster?

Save this question.

What is the difference between float32 and float64?

float32 is a 32 bit number - float64 uses 64 bits. That means that float64's take up twice as much memory - and doing operations on them may be a lot slower in some machine architectures. However, float64's can represent numbers much more accurately than 32 bit floats. They also allow much larger numbers to be stored.

What does float32 mean in Python?

The float type in Python represents the floating point number. Float is used to represent real numbers and is written with a decimal point dividing the integer and fractional parts. For example, 97.98, 32.3+e18, -32.54e100 all are floating point numbers.


1 Answers

From the documentation of cuDNN (section 2.7, subsection Type Conversion) you can see:

Note: Accumulators are 32-bit integers which wrap on overflow.

and that this holds for the standard INT8 data type of the following: the data input, the filter input and the output.

Under those assumptions, @jiandercy is right that there's a float16 to float32 conversion and then back-conversion before returning the result, and float16 would be slower.

like image 194
LemonPy Avatar answered Oct 24 '22 17:10

LemonPy