I wanted to train on multiple CPU so i run this command
C:\Users\solution\Desktop\Tensorflow\research>python object_detection/train.py --logtostderr --pipeline_config_path=C:\Users\solution\Desktop\Tensorflow\myFolder\power_drink.config --train_dir=C:\Users\solution\Desktop\Tensorflow\research\object_detection\train --num_clones=2 --clone_on_cpu=True
and i got the following error
Traceback (most recent call last): File "object_detection/train.py", line 169, in tf.app.run() File "C:\Users\solution\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run _sys.exit(main(argv)) File "object_detection/train.py", line 165, in main worker_job_name, is_chief, FLAGS.train_dir) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 246, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\solution\Desktop\Tensorflow\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, **kwargs) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 158, in _create_losses train_config.merge_multiple_label_boxes) ValueError: not enough values to unpack (expected 7, got 0)
If i set num_clones to 1 or omitted it, it works normally. I also tries setting --ps_tasks=1 which doesn't help
any advice would be appreciated
I solved this issue by changing one parameter in my original configuration slightly:
...
train_config: {
fine_tune_checkpoint: "C:/some_path/model.ckpt"
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 25000
...
}
...
Changing the parameter replicas_to_aggregate: 1
, or setting sync_replicas: false
both solves the problem for me, since I was training only on one graphics card and did not have any replicas (as you would have when training on TPU).
You don't mention which type of model you are training - if like me you were using the default model from the TensorFlow Object Detection API example (Faster-RCNN-Inception-V2) then num_clones
should equal the batch_size
. I was using a GPU however, but when I went from one clone to two, I saw a similar error and setting batch_size: 2
in the training config file was the solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With