I want to do some refactoring of code, especially the "include"-like relationships between files. There are quite a few of them, and to get started, it would be helpful to have a list, diagram, or even a columnar graph, so that I can see at a glance what is included from where.
(In many cases, a given file is included by multiple other files, so the graph would be a DAG, not a tree. There are no cycles.)
I'm working with TeX (actually ConTeXt), but the question would seem to apply to any programming languages that has a facility like that of #include
in C.
The obvious, easy answer is to do a grep
or "Find in Files" on all the .tex files for the relevant keywords (\usemodule
, \input
, and a couple of other macros we've defined). This is better than nothing, but the output is long, and it's still difficult to see patterns in what includes what. For example, is file A usually included before file B? Is file C ever included multiple times by the same file?
I guess that brings out an additional, but optional feature: that such a tool would be able to show the sequence of includes from a particular file. So in that case the DAG could be a multigraph, i.e. there could be multiple arcs from one file to another.
Ideally, it would be nice to be able to annotate each file, giving a very brief summary of what's in it. This would form part of the text on the graph node for that file.
Probably this sort of thing could be done with a script that generates graphviz dot language. But I wanted to know if it has already been done, rather than reinvent the wheel.
As it is friday in my country right now, and I'm waiting for my colleagues to go to have a beer, I thought I'd do a little programming.
Here http://www.luki.webzdarma.cz/up/IncludeGraph.zip you can download source of a really simple utility that looks for all files in one folder, parses #includes and generates a .dot file for that.
It supports and correctly handles relative paths, and works on windows and should work on linux as well. It is written in very spartan way. My version of dot is not parsing the generated files, there is some bug, but I really need to go drinking now, see if you can fix it. I'm not a regular dot user and I dont see it, though I'm sure it is pretty obvious.
Enjoy ...
PS - If you run into trouble compiling and/or running, please let me know. Thanks.
EDIT
Ok, my bad, there was a few glitches on linux. The dot problem was it was using "graph" instead of "digraph". But it is working like charm now. Here is the link. Just type make, and if that goes, make test should generate the following diagram of the program itself:
It ignores preprocessor directives in the C++ files so it is not very useful for that directly (could be fixed by simply calling g++ with preprocessor output flag and processing that instead of the actual files). I didn't get to regexp today, but if you have any programming experience, you will find that modifying DotGraph.cpp shouldn't be very hard to hard-code your inclusion token, and to change list of file extensions. Might get to regexp tomorrow or something.
A clever and general solution would be to trace the build system (using something like strace, LD_PRELOAD, patching the binaries, or some other debugging facility).
Once you'd collected the sequence of file open/close operations, you'd just have to filter out the uninteresting stuff, it should be easy to build the dependency tree for any language as long as the following assumptions are true:
Unfortunately, a well-written or poorly-written compiler might violate these assumptions by for instance only opening a file the first time it is included, or never closing any files.
Perhaps because of these limitations, I'm not aware of any implementation of this idea.
On the other hand, clever build systems may include functionality to compute or extract dependencies themselves. gcc has the -M
option to output dependencies, and javac figures out dependencies on its own (although I don't know how to get it to output them).
As far as TeX goes, I don't know enough TeX to actually implement this, but conceptually it seems like it should be possible to redefine the low-level include to command to:
You could then build your tree from the log output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With