Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make TensorFlow use more available CPU

How can I fully utilize each of my EC2 cores?

I'm using a c4.4xlarge AWS Ubuntu EC2 instance and TensorFlow to build a large convoluted neural network. nproc says that my EC2 instance has 16 cores. When I run my convnet training code, the top utility says that I'm only using 400% CPU. I was expecting it to use 1600% CPU because of the 16 cores. The AWS EC2 monitoring tab confirms that I'm only using 25% of my CPU capacity. This is a huge network, and on my new Mac Pro it consumes about 600% CPU and takes a few hours to build, so I don't think the reason is because my network is too small.

I believe the line below ultimately determines CPU usage:

sess = tf.InteractiveSession(config=tf.ConfigProto())

I admit I don't fully understand the relationship between threads and cores, but I tried increasing the number of cores. It had the same effect as the line above: still 400% CPU.

NUM_THREADS = 16
sess = tf.InteractiveSession(config=tf.ConfigProto(intra_op_parallelism_threads=NUM_THREADS))

EDIT:

  • htop shows that shows that I am actually using all 16 of my EC2 cores, but each core is only at about 25%
  • top shows that my total CPU % is around 400%, but occasionally it will shoot up to 1300% and then almost immediately go back down to ~400%. This makes me think there could be a deadlock problem
like image 447
user554481 Avatar asked Jul 16 '16 21:07

user554481


1 Answers

Several things you can try:

Increase the number of threads

You already tried changing the intra_op_parallelism_threads. Depending on your network it can also make sense to increase the inter_op_parallelism_threads. From the doc:

inter_op_parallelism_threads:

Nodes that perform blocking operations are enqueued on a pool of
inter_op_parallelism_threads available in each process. 0 means the system picks an appropriate number.

intra_op_parallelism_threads:

The execution of an individual op (for some op types) can be parallelized on a pool of intra_op_parallelism_threads. 0 means the system picks an appropriate number.

(Side note: The values from the configuration file referenced above are not the actual default values tensorflow uses but just example values. You can see the actual default configuration by manually inspecting the object returned by tf.ConfigProto().)

Tensorflow uses 0 for the above options meaning it tries to choose appropriate values itself. I don't think tensorflow picked poor values that caused your problem but you can try out different values for the above option to be on the safe side.


Extract traces to see how well your code parallelizes

Have a look at tensorflow code optimization strategy

It gives you something like this. In this picture you can see that the actual computation happens on far fewer threads than available. This could also be the case for your network. I marked potential synchronization points. There you can see that all threads are active for a short moment which potentially is the reason for the sporadic peaks in CPU utilization that you experience.

Miscellaneous

  • Make sure you are not running out of memory (htop)
  • Make sure you are not doing a lot of I/O or something similar
like image 164
ben Avatar answered Sep 22 '22 14:09

ben