Float16 slower than float32 in keras

Tags:

I'm testing out my new NVIDIA Titan V, which supports float16 operations. I noticed that during training, float16 is much slower (~800 ms/step) than float32 (~500 ms/step).

To do float16 operations, I changed my keras.json file to:

Click to copy

{
"backend": "tensorflow",
"floatx": "float16",
"image_data_format": "channels_last",
"epsilon": 1e-07
}

Why are the float16 operations so much slower? Do I need to make modifications to my code and not just the keras.json file?

I am using CUDA 9.0, cuDNN 7.0, tensorflow 1.7.0, and keras 2.1.5 on Windows 10. My python 3.5 code is below:

Click to copy

img_width, img_height = 336, 224

train_data_dir = 'C:\\my_dir\\train'
test_data_dir = 'C:\\my_dir\\test'
batch_size=128

datagen = ImageDataGenerator(rescale=1./255,
    horizontal_flip=True,   # randomly flip the images 
    vertical_flip=True) 

train_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

test_generator = datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary')

# Architecture of NN
model = Sequential()
model.add(Conv2D(32,(3, 3), input_shape=(img_height, img_width, 3),padding='same',kernel_initializer='lecun_normal'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64,(3, 3),padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(AveragePooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(1))
model.add(Activation('sigmoid'))

my_rmsprop = keras.optimizers.RMSprop(lr=0.0001, rho=0.9, epsilon=1e-04, decay=0.0)
model.compile(loss='binary_crossentropy',
          optimizer=my_rmsprop,
          metrics=['accuracy'])

# Training 
nb_epoch = 32
nb_train_samples = 512
nb_test_samples = 512

model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples/batch_size,
    epochs=nb_epoch,
    verbose=1,
    validation_data=test_generator,
    validation_steps=nb_test_samples/batch_size)

# Evaluating on the testing set
model.evaluate_generator(test_generator, nb_test_samples)

928

asked Apr 11 '18 18:04

Mark

1 Answers

From the documentation of cuDNN (section 2.7, subsection Type Conversion) you can see:

Note: Accumulators are 32-bit integers which wrap on overflow.

and that this holds for the standard INT8 data type of the following: the data input, the filter input and the output.

Under those assumptions, @jiandercy is right that there's a float16 to float32 conversion and then back-conversion before returning the result, and float16 would be slower.

194

answered Oct 24 '22 17:10

LemonPy

Related questions
                            
                                Why is regex search in substring "not completely equivalent to slicing the string" in Python?
                            
                                Why is pip, inside a virtualenv, writing to /usr/lib?
                            
                                Numpy octuple precision floats and 128 bit ints. Why and how?
                            
                                Pandas GroupBy memory deallocation
                            
                                Bug in SQLAlchemy Rollback after DB Exception?
                            
                                Storing tensorflow models in memory
                            
                                Python Tuple in Java XMLRPC
                            
                                Python packaging: Generate a python file at installation time, have this work with tox
                            
                                What are the best practices for combining marshmallow schema definitions and OO in Python? [closed]
                            
                                How can I make QScintilla auto-indent like SublimeText?
                            
                                Is there a Google Data API (gdata) for Python 3.x?
                            
                                Referencing Python "import" assemblies when calling from IronPython in C#
                            
                                Dynamically setting Flask-SQLAlchemy database connection in multi-tenant app
                            
                                Twisted XmlStream: How to connect to events?
                            
                                python: bandpass filter of an image
                            
                                Indirect inline in Django admin
                            
                                Python JSON dummy data generation from JSON schema
                            
                                How to add dynamic python modules to PyInstaller's specs?
                            
                                How to implement Tensorflow batch normalization in LSTM
                            
                                Twitter: How to extract tweets containing symbols (!,%,$)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Float16 slower than float32 in keras

Tags:

python

tensorflow

keras

Mark

People also ask

1 Answers

LemonPy

Recent Activity

Donate For Us