I am training a model, and when I open the TPU in the Google Cloud Platform console, it shows me the CPU utilization (on the TPU, I suppose). It is really, really, low (like 0.07%), so maybe it is the VM CPU? I am wondering whether the training is really proper or if the TPUs are just that strong.
Is there any other way to check the TPU usage? Maybe with a ctpu
command?
Before you run this Colab notebook, make sure that your hardware accelerator is a TPU by checking your notebook settings: Runtime > Change runtime type > Hardware accelerator > TPU.
A single Cloud TPU chip contains 2 cores, each of which contains multiple matrix units (MXUs) designed to accelerate programs dominated by dense matrix multiplications and convolutions (see System Architecture).
Use Cloud TPUs for free, right in your browser If you'd like to get started with Cloud TPUs right away, you can access them for free in your browser using Google Colab.
I would recommend using the TPU profiling tools that plug into TensorBoard. A good tutorial for install and use of these tools can be found here.
You'll run the profiler while your TPU is training. It will add an extra tab to your TensorBoard with TPU-specific profiling information. Among the most useful:
Based on these metrics, the profiler will suggest ways to start optimizing your model to train well on a TPU. You can also dig into the more sophisticated profiling tools like a trace viewer, or a list of the most expensive graph operations.
For some guidelines on performance tuning (in addition to those ch_mike already linked) you can look at the TPU performance guide.
If you are looking at GCP -> Compute Engine -> TPU, you are looking at the correct spot. If you see the monitoring graphs of your associated Compute Engine instance, you’ll see the CPU graph is different.
Currently, it doesn’t seem to be any other way to look for that information, since none of these options provide it:
gcloud compute tpus describe <tpu-name> --zone=<zone>
ctpu status --details
Nor does the TPU API
As whether your training is proper or not, it would be hard to say, you can refer to Using TPU and make sure you are following the guidelines there. Another useful resource would be Improving training speed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With