Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which static analysis tool to use for scanning data flow from one method to another?

Say there are two methods in my library:

void com.somepackage.SomeClass.someSink(String s)

and

int com.someotherpackage.SomeOtherClass.someSource(int i)

The first method is used as a data sink, while the second as a data source in my code. The type parameters int, String are just given as an example and may change in the actual situation.

I want to detect the usage of these methods in some code that satisfy a certain pattern given below:

  1. some data (say x) is generated by the source
  2. some data (say y) is generated using a series of transformations f1(f2(... fn(x))
  3. y is given to the sink.

The transformations can be any arbitrary functions as long as there is a sequence of calls from the function that generates the data for the sink to a function that takes in data from the source. The functions may take any other parameters as well and are to be used as a black-box.

The scanning can be at the source or bytecode level. What are the tools available out there for this type of analysis?

Prefer non-IDE based tools with Java APIs.

[EDIT:] to clarify more, someSink and someSource are arbitrary methods names in classes SomeSome and SomeOtherClass respectively. They may or may not be static and may take arbitrary number of parameters (which I should be able to define). The type of the parameters is also not arbitrary. The only requirement is that the tool should scan the code and output line numbers where the pattern occurs. So the tool might work this way:

  • Obtain sink and source names (fully qualified name of class and method name) from user.
  • Statically scan the code and find all places where the given sink and source are used
  • Check if a path exists where some data output by source is given to sink either directly or indirectly via a series of operations (operators, methods).
  • Ignore those sources/sinks where no such path exists and output the remaining ones (if any).

Example output:

MyClass1.java:12: value1 = com.someotherpackage.SomeOtherClass.someSource(...)
MyClass2.java:23: value2 = foo(value1, ...)
MyClass3.java:3: value3 = bar(value2)
MyClass4.java:22: com.somepackage.SomeClass.someSink(value3, ...)

Note: If a function does not take parameters but has some side affects on the data also needs to be considered. (Example a = source(); void foo(){ c = a+b }; foo(); sink(c) is a pattern that needs to be caught.)

like image 334
Jus12 Avatar asked May 06 '12 19:05

Jus12


People also ask

Which tool is used for static code analysis?

Source code analysis tools, also known as Static Application Security Testing (SAST) Tools, can help analyze source code or compiled versions of code to help find security flaws. SAST tools can be added into your IDE. Such tools can help you detect issues during software development.

What is static flow analysis?

Static information flow inference analysis is a technique which automatically infers information flows based on data or control dependence. It can be utilized for the purposes of general program understanding, detection of security attacks and security vulnerabilities, and type in- ference for security type systems.

Which technique of static flow analysis is used for gathering information about the possible set of values?

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.


2 Answers

After doing some research, I find that soot is the best suited for this kind of task. Soot is more mature than other open source alternatives such as PQL.

like image 81
Jus12 Avatar answered Oct 19 '22 19:10

Jus12


So the role of the source and sink methods is simply that x originates in the source method (somewhere) and is consumed (somewhere) in the target method? How do you characterize "x", or do you simply want all x that have this property?

Assuming you have identified a specific x in the source method, do you a) insist that x be passed to the target method only by method calls [which would make the target method the last call in your chain of calls], or can one of the intermediate values be copied? b) insist that each function call has exactly one argument?

We have done something like this for large C systems. The problem was to trace an assigned variable into a use in other functions whereever they might be, including values not identical in representation but identical in intent ("abstract copy"; the string "1.0" is abstractly equivalent to the integer 1 if I use the string eventually as a number; "int_to_string" is an "abstract copy" function that converts a value in one representation to an equivalent value in another.).

What we needed for this is a reaching definitions analysis for each function ("where does the value from a specific assignment go?"), and an "abstract copy" reaching analysis that determines where a reaching value is consumed by special functions tagged as "abstract copies", and where the result of that abstact copy function reaches to. Then a transitive closure of "x reaches z" and "x reaches f(x) reaches z" computed where x can go.

We did this using our DMS Software Reengineering Toolkit, which provides generic parsing and flow analysis machinery, and DMS's C Front End, which implements the specific reaching and abstract-copy-reaching computations for C. DMS has a Java Front End which computes reaching definitions; one would have add the abstact-copy-reaching logic and reimplement the transitive closure code.

like image 43
Ira Baxter Avatar answered Oct 19 '22 20:10

Ira Baxter