I have done a lot of research to figure out how a DFG can be created for an application from its source code. There are DFG's available online for certain applications such as MP3 Decoder, JPEG compression and H.263 Decoder.
I haven't been able to figure out how I can create a DFG for an application such as HEVC from its source code? Are there any tools which can instantly generate data flow graphs for such elaborate applications or does it have to be done manually?
Please advise me regarding this matter.
EDIT: I used Doxygen for HEVC and I could see how different functions were interacting with each other. However, every function had many entry and exit points and output of Doxygen became too confusing to follow after a while.
I also looked at StreamIt: http://camlunity.ru/swap/Library/Conflux/Stream%20Programming/streamit-cc_stream_graph_programming_language.pdf
It seemed handy but the graphs it generated for even simpler applications (like MP3 Decoder) were too complex. In order to generate a coherent DFG, will I have to re-write the entire source code?
The Data Flow Graph (DFG) model translates the data processing requirements into a data flow graph. The Data Flow Graph (DFG) is a data driven model in which the program execution is determined by data. This model emphasizes on the data and operations on the data which transforms the input data to output data.
A data-flow graph is a collection of arcs and nodes in which the nodes are either places where variables are assigned or used, and the arcs show the relationship between the places where a variable is assigned and where the assigned value is subsequently used.
It is the analysis of flow of data in control flow graph, i.e., the analysis that determines the information regarding the definition and use of data in program. With the help of this analysis, optimization can be done.
Our fundamental model for programs is the control/dataflow graph (CDFG). (We can also model hardware behaviorwiththe CDFG.) As the name implies, the CDFG has constructs that model bothdata operations (arithmetic and other computations) and controloperations (conditionals).
You want to extract data flow graphs from arbitrary languages. You imply you want a single way to do it. This isn't practical by hand... you need a tool.
Such a tool is singularly hard to build.
To do this, for each language you must be able to:
Each of these tasks by themselves are hard, because the languages tend to be complex. Most language tools that can do this at all (mostly compilers) do it only for one dialect of the language.
To do it for more than one language/dialect, you need a tool that can be configured for all the details for each language, and you have to configure for all the languages of interest. [Realistically you cannot "do them all"; there's thousands of computer languages in use right now].
Even limiting yourself to the "everyday" common programming languages, this is an enormous amount of work; it can take a few years to do all this well for a single mainstream language. You won't succeed in doing this by yourself.
My company builds a single, unified tool that is intended to be capable of doing this: the DMS Software Reengineering Toolkit. The simple "secret" is to realize that the machinery needed to accomplish the above tasks is actually very similar across languages, and can be designed to configured for a particular language with relatively modest (that doesn't mean "small") effort.
After 20 linear years of engineering with a team of PhD level engineers, we have parsers (even this is hard) for a surprising variety of languages, with full up data flow analyzers of the type you are talking about for C++ (check this link for examples), C, COBOL and almost Java 8.
I don't know of any other unified tools that are this far down the path toward your ideal. Check my bio before you decide I'm clueless about this. (Rascal/MPL has some ambitions but is a research tool at this point; they don't do C or C++ at all) We're only part way there, with many languages and scale battles to fight remaining.
[The goal of DMS isn't data flow analysis; that's just a stepping stone. It is to do automated code transformation, which requires data flow analysis to do safely and correctly].
Of course, you could just hope to find a separate tool for each language. You wouldn't get consistent quality or consistent style/granularity of data flow graphs from separate tools from different authors, if you can indeed get a full set of such tools at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With