[Tensorflow][Object detection] ValueError when try to train with --num_clones=2

Question

I wanted to train on multiple CPU so i run this command

C:\Users\solution\Desktop\Tensorflow esearch>python object_detection/train.py --logtostderr --pipeline_config_path=C:\Users\solution\Desktop\Tensorflow\myFolder\power_drink.config --train_dir=C:\Users\solution\Desktop\Tensorflow esearch\object_detection rain --num_clones=2 --clone_on_cpu=True

and i got the following error

Traceback (most recent call last): File "object_detection/train.py", line 169, in tf.app.run() File "C:\Users\solution\AppData\Local\Programs\Python\Python35\lib\site-packages ensorflow\python\platform\app.py", line 124, in run _sys.exit(main(argv)) File "object_detection/train.py", line 165, in main worker_job_name, is_chief, FLAGS.train_dir) File "C:\Users\solution\Desktop\Tensorflow esearch\object_detection rainer.py", line 246, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\solution\Desktop\Tensorflow esearch\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, **kwargs) File "C:\Users\solution\Desktop\Tensorflow esearch\object_detection rainer.py", line 158, in _create_losses train_config.merge_multiple_label_boxes) ValueError: not enough values to unpack (expected 7, got 0)

If i set num_clones to 1 or omitted it, it works normally. I also tries setting --ps_tasks=1 which doesn't help

any advice would be appreciated

Alexander Pacha · Accepted Answer

I solved this issue by changing one parameter in my original configuration slightly:

...
train_config: {
  fine_tune_checkpoint: "C:/some_path/model.ckpt"
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 25000
  ...
}
...

Changing the parameter replicas_to_aggregate: 1, or setting sync_replicas: false both solves the problem for me, since I was training only on one graphics card and did not have any replicas (as you would have when training on TPU).

4Oh4 · Answer

You don't mention which type of model you are training - if like me you were using the default model from the TensorFlow Object Detection API example (Faster-RCNN-Inception-V2) then num_clones should equal the batch_size. I was using a GPU however, but when I went from one clone to two, I saw a similar error and setting batch_size: 2 in the training config file was the solution.

[Tensorflow][Object detection] ValueError when try to train with --num_clones=2

Tags:

python

tensorflow

object-detection

KoS

2 Answers

Alexander Pacha

4Oh4

Recent Activity

Donate For Us

[Tensorflow][Object detection] ValueError when try to train with --num_clones=2

Tags:

python

tensorflow

object-detection

KoS

2 Answers

Alexander Pacha

4Oh4

Related questions

Recent Activity

Donate For Us