How can one create a data flow graph (DFG/SDFG) for any application from its source code

Tags:

I have done a lot of research to figure out how a DFG can be created for an application from its source code. There are DFG's available online for certain applications such as MP3 Decoder, JPEG compression and H.263 Decoder.

I haven't been able to figure out how I can create a DFG for an application such as HEVC from its source code? Are there any tools which can instantly generate data flow graphs for such elaborate applications or does it have to be done manually?

Please advise me regarding this matter.

EDIT: I used Doxygen for HEVC and I could see how different functions were interacting with each other. However, every function had many entry and exit points and output of Doxygen became too confusing to follow after a while.

I also looked at StreamIt: http://camlunity.ru/swap/Library/Conflux/Stream%20Programming/streamit-cc_stream_graph_programming_language.pdf

It seemed handy but the graphs it generated for even simpler applications (like MP3 Decoder) were too complex. In order to generate a coherent DFG, will I have to re-write the entire source code?

300

asked Jan 04 '17 21:01

a_sid

1 Answers

You want to extract data flow graphs from arbitrary languages. You imply you want a single way to do it. This isn't practical by hand... you need a tool.

Such a tool is singularly hard to build.

To do this, for each language you must be able to:

Define the language to the tool, in the form you find it in practice (not just the language reference manual version). C++ in the wild is bent in a lot of funny ways compared to the standard.
Parse programs in the language as found in the field, perhaps as one file, perhaps as tens of thousands; some programs aren't small.
Build structures representing the language elements and their relations to one another (this is often done as an abstract syntax tree)
Determine for each literal what its actual value is; "a\xbc" has very different values depending on whether the language thinks it is ascii or unicode text with escape sequences
Find all the identifiers in the code, and determine for each one which definitional/type information is associated with it according the language scoping rules
Determine the sources of data (literal values, inputs from the outside world, results of expressions) and track where those data values are used in other parts of the program across various control flow constructs
Presumably draw some picture of the resulting data flow.

Each of these tasks by themselves are hard, because the languages tend to be complex. Most language tools that can do this at all (mostly compilers) do it only for one dialect of the language.

To do it for more than one language/dialect, you need a tool that can be configured for all the details for each language, and you have to configure for all the languages of interest. [Realistically you cannot "do them all"; there's thousands of computer languages in use right now].

Even limiting yourself to the "everyday" common programming languages, this is an enormous amount of work; it can take a few years to do all this well for a single mainstream language. You won't succeed in doing this by yourself.

My company builds a single, unified tool that is intended to be capable of doing this: the DMS Software Reengineering Toolkit. The simple "secret" is to realize that the machinery needed to accomplish the above tasks is actually very similar across languages, and can be designed to configured for a particular language with relatively modest (that doesn't mean "small") effort.

After 20 linear years of engineering with a team of PhD level engineers, we have parsers (even this is hard) for a surprising variety of languages, with full up data flow analyzers of the type you are talking about for C++ (check this link for examples), C, COBOL and almost Java 8.

I don't know of any other unified tools that are this far down the path toward your ideal. Check my bio before you decide I'm clueless about this. (Rascal/MPL has some ambitions but is a research tool at this point; they don't do C or C++ at all) We're only part way there, with many languages and scale battles to fight remaining.

[The goal of DMS isn't data flow analysis; that's just a stepping stone. It is to do automated code transformation, which requires data flow analysis to do safely and correctly].

Of course, you could just hope to find a separate tool for each language. You wouldn't get consistent quality or consistent style/granularity of data flow graphs from separate tools from different authors, if you can indeed get a full set of such tools at all.

answered Nov 14 '22 01:11

Ira Baxter

Related questions
                            
                                Count the number of number x that has digit sum equal the digit sum of x*m
                            
                                Recursive(?) algorithm design
                            
                                Complexity in a recursion algorithm
                            
                                Geolocations - How to check if 2 circles are overlapping
                            
                                Can binary search tree implement a Map?
                            
                                Java Quicksort why / where do the values change?
                            
                                How to deal with duplicates in red-black trees?
                            
                                What is the fastest way to calculate determinant?
                            
                                How to efficiently draw exactly N points on screen?
                            
                                Minimum number of operations to execute an associative operations on overlapping sets
                            
                                Most efficient way to find index of matching values in two sorted arrays using C++
                            
                                Is an iterative solution for Fibonacci series pseudo-polynomial?
                            
                                get the number of overlapping intervals, given two lists of intervals
                            
                                summing array of doubles with large value span : proper algorithm
                            
                                Detecting loops in tree structures (graphs)
                            
                                Data structure + algorithm for ipv4 storage - efficient searching in prefixes
                            
                                Space Complexity of Quick Sort
                            
                                What is the time complexity of the given code?
                            
                                Interesting 16x16 grid sum
                            
                                Efficient to delete all substrings of other elements within an array in Ruby

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can one create a data flow graph (DFG/SDFG) for any application from its source code

Tags:

algorithm

data-structures

parallel-processing

open-source

dataflow

a_sid

People also ask

1 Answers

Ira Baxter

Recent Activity

Donate For Us