Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

[Tensorflow][Object detection] ValueError when try to train with --num_clones=2

I wanted to train on multiple CPU so i run this command

C:\Users\solution\Desktop\Tensorflow\research>python object_detection/train.py --logtostderr --pipeline_config_path=C:\Users\solution\Desktop\Tensorflow\myFolder\power_drink.config --train_dir=C:\Users\solution\Desktop\Tensorflow\research\object_detection\train --num_clones=2 --clone_on_cpu=True

and i got the following error

Traceback (most recent call last): File "object_detection/train.py", line 169, in tf.app.run() File "C:\Users\solution\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run _sys.exit(main(argv)) File "object_detection/train.py", line 165, in main worker_job_name, is_chief, FLAGS.train_dir) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 246, in train clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue]) File "C:\Users\solution\Desktop\Tensorflow\research\slim\deployment\model_deploy.py", line 193, in create_clones outputs = model_fn(*args, **kwargs) File "C:\Users\solution\Desktop\Tensorflow\research\object_detection\trainer.py", line 158, in _create_losses train_config.merge_multiple_label_boxes) ValueError: not enough values to unpack (expected 7, got 0)

If i set num_clones to 1 or omitted it, it works normally. I also tries setting --ps_tasks=1 which doesn't help

any advice would be appreciated

like image 296
KoS Avatar asked Feb 14 '18 20:02

KoS


2 Answers

I solved this issue by changing one parameter in my original configuration slightly:

...
train_config: {
  fine_tune_checkpoint: "C:/some_path/model.ckpt"
  batch_size: 1
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 25000
  ...
}
...

Changing the parameter replicas_to_aggregate: 1, or setting sync_replicas: false both solves the problem for me, since I was training only on one graphics card and did not have any replicas (as you would have when training on TPU).

like image 113
Alexander Pacha Avatar answered Oct 01 '22 02:10

Alexander Pacha


You don't mention which type of model you are training - if like me you were using the default model from the TensorFlow Object Detection API example (Faster-RCNN-Inception-V2) then num_clones should equal the batch_size. I was using a GPU however, but when I went from one clone to two, I saw a similar error and setting batch_size: 2 in the training config file was the solution.

like image 34
4Oh4 Avatar answered Oct 01 '22 01:10

4Oh4