I have a TensorFlow model which is compiled to XLA for use with some Graphcore IPUs. For debug purposes, I am trying to dump the XLA graph to a .dot file to visualise it in my browser.
For this I use the following flags:
--xla_dump_to=/mnt/data/perf_prof_5/xla_dump --xla_dump_hlo_as_dot --xla_dump_hlo_pass_re=forward-allocation --xla_hlo_graph_sharding_color
However, I get multiple output files and I am not sure which one is the right one. Their name and number also vary based on whether I compile the graph or not (using the ipu.compile
instruction).
Which file contains the graph and what do the names mean?
The XLA dumping is a TensorFlow native feature. It dumps one file per graph. The number of graphs produced depends on the number of TensorFlow to XLA to HLO modules produced. This can generally be predicted from the number of sess.run
calls on distinct graphs you make. For example, if your program contains a variable initialisation then this initialisation will be compiled as a separate XLA graph and appear as a separate graph when dumped. If your program creates a report op, then that will also be compiled as a separate XLA graph.
Typically, ipu_compiler.compile
forces the compilation into a single XLA graph. If you don't use ipu_compiler.compile
, the native XLA scheduler will combine or split up parts of the TensorFlow graph as it sees fit, creating many XLA graphs - this is why you see far more graphs dumped when not using ipu_compiler.compile
.
NOTE: There is no guarantee your compiled op will only produce one XLA graph. Sometimes, others are made, e.g. for casting.
As for the naming, it can be broken down as follows:
module_XXXX.YYYY.IPU.after_allocation-finder.before_forward-allocation.dot
We always have a module_ prefix, which is just to signal that this is the graph for a HLO Module.
The first XXXX is the HLO module's unique ID. There is no guarantee about the spacing between IDs, just that they are unique and increasing.
To understand the rest of the name - YYYY.IPU.......dot - we need to understand that the XLA graph is operated on by multiple different HLO passes, each modifying the XLA graph by optimizing, shuffling or otherwise rewriting it. After these passes, the graph is then lowered to Poplar. There are some TensorFlow native HLO passes, and there are some IPU specific ones. When dumping the XLA graphs, we can render the XLA graph before and after any HLO pass (e.g. to see the pass's effect on the graph) by supplying the argument --xla_dump_hlo_pass_re=XXX
, where XXX is regex describing which passes you want. TensorFlow will then render the XLA graph before and after every pass that matches that regex (by its name). For example, if you wanted to see the effect of every XLA HLO IPU pass involving while loops, you could use xla_dump_hlo_pass_re=*While*
. Finally, the number YYYY is the ID pertaining to the order in which these graphs are generated, and the passes which the graph was "between" when it was rendered are appended to the filename.
The "before_optimizations" graph is always rendered if dumping XLA.
Unfortunately, there is no formal way of knowing which XLA graph is your main program, as the unique ids are somewhat arbitrary and the importance of the contents of each XLA graph is tacit knowledge of the user. The closest approximation possible is likely the file or visual sizes - the main program XLA graph should be much larger than others. As a crude way, you could include a very specific op in your main graph and search for it in the XLA graphs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With