I'm trying to retraining the Inception v3 model in tensorflow for my own custom categories. I have downloaded some data and formatted it into directories. When I run, the python script creates bottlenecks for the images, and then when it runs, on the first training step( step 0) it has a critical error, where it tries to modulo by 0. It appears in the get_image_path function when computing the mod_index, which is index % len(category_list) so the category_list must be 0 right?
Why is this happening and how can I prevent it?
EDIT: Here's the exact code I'm seeing inside docker
2016-07-04 01:27:52.005912: Step 0: Train accuracy = 40.0%
2016-07-04 01:27:52.006025: Step 0: Cross entropy = 1.109777
CRITICAL:tensorflow:Category has no images - validation.
Traceback (most recent call last):
File "tensorflow/examples/image_retraining/retrain.py", line 824, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "tensorflow/examples/image_retraining/retrain.py", line 794, in main
bottleneck_tensor))
File "tensorflow/examples/image_retraining/retrain.py", line 484, in get_random_cached_bottlenecks
bottleneck_tensor)
File "tensorflow/examples/image_retraining/retrain.py", line 392, in get_or_create_bottleneck
bottleneck_dir, category)
File "tensorflow/examples/image_retraining/retrain.py", line 281, in get_bottleneck_path
category) + '.txt'
File "tensorflow/examples/image_retraining/retrain.py", line 257, in get_image_path
mod_index = index % len(category_list)
ZeroDivisionError: integer division or modulo by zero
Fix:
The issue happens when you have less number of images in any of your sub folders.
I have faced same issue when total number of images under a particular category was less than 30, please try to increase the image count to resolve the issue.
Reason:
For each label (sub folder), tensorflow tries to create 3 categories of images (Train, Test and Validation) and places the images under it based on a probability value (calculated using hash of label name).
An image is placed in the category folder only if the probability value is less than the category (Train, Test or validation) size.
Now if number of images inside a label are less ( say 25) then validation size is calculated as 10 (default) and the probability value is usually greater than 10 and hence no image is placed in the validation set.
Later when all bottlenecks are created and tf is trying to calculate validation accuracy, it first throws an fatal log message:
CRITICAL:tensorflow:Category has no images - validation.
and then continues to execute the code and crashes as it tries to divide by validation list size (which is 0).
I've modified retrain.py to ensure that at least there is an image in validation (line 201*)
if len(validation_images) == 0:
validation_images.append(base_name)
elif percentage_hash < validation_percentage:
(*) Line number may change in future releases. Look at the comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With