Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does shuffling my validation set in Keras change my model's performance?

Why I'm confused:

If I test my model on examples [A, B, C], it will obtain a certain accuracy. If I test the same model on examples [C, B, A], it should obtain the same accuracy. In other words, shuffling the examples shouldn't change my model's accuracy. But that's what seems to be happening below:

Step-by-step:

Here is where I train the model:

model.fit_generator(batches, batches.nb_sample, nb_epoch=1, verbose=2,
                    validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)

Here is where I test the model, without shuffling the validation set:

gen = ImageDataGenerator()
results = []
for _ in range(3):
    val_batches = gen.flow_from_directory(path+"valid", batch_size=batch_size*2,
                                          target_size=target_size, shuffle=False)
    result = model.evaluate_generator(val_batches, val_batches.nb_sample)
    results.append(result)

Here are the results (val_loss, val_acc):

[2.8174608421325682, 0.17300000002980231]
[2.8174608421325682, 0.17300000002980231]
[2.8174608421325682, 0.17300000002980231]

Notice that the validation accuracies are the same.

Here is where I test the model, with a shuffled validation set:

results = []
for _ in range(3):
    val_batches = gen.flow_from_directory(path+"valid", batch_size=batch_size*2,
                                          target_size=target_size, shuffle=True)
    result = model.evaluate_generator(val_batches, val_batches.nb_sample)
    results.append(result)

Here are the results (val_loss, val_acc):

[2.8174608802795409, 0.17299999999999999]
[2.8174608554840086, 0.1730000001192093]
[2.8174608268737793, 0.17300000059604645]

Notice that the validation accuracies are inconsistent, despite an unchanged validation set and an unchanged model. What's going on?


Note:

I'm evaluating on the entire validation set each time. model.evaluate_generator returns after evaluating the model on the number of examples equal to val_batches.nb_sample, which is the number of examples in the validation set.

like image 616
Matt Kleinsmith Avatar asked Jan 24 '17 18:01

Matt Kleinsmith


1 Answers

This is a really interesting problem. The answer is that it's because of that neural networks are using a float32 format which is not so accurate as float64 - the fluctuation like this are simply the realisation of an underflow phenomenon.

It case of your loss - you may notice that the differences are occuring after 7th decimal digit of a fractional part - what is exactly the precision of a float32 format. So - basically - you may assume that all numbers presented in your example are equal in terms of a float32 representation.

like image 88
Marcin Możejko Avatar answered Nov 18 '22 10:11

Marcin Możejko