Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String Analysis for complete string regular expression

Tags:

java

string

regex

I am looking for a tool like Java String Analysis (JSA) that could sum up a string as a regex. I have tried to do that with JSA, but there I need to search for a specific method like StringBuffer.append or other string operations.

I have strings like that:

        StringBuilder test=new StringBuilder("hello ");
        boolean codition=false;
        if(codition){
            test.append("world");
        }
        else{
            test.append("other world");
        }
        test.append(" so far");
        for(int i=0;i<args.length;i++){
            test.append(" again hello");
        }

        // regularExpression = "hello (world| other world) so far( again hello)*"

And my JSA implementation looks like that so far:

    public static void main(String[] args) {
        StringAnalysis.addDirectoryToClassPath("bootstrap.jar");

        StringAnalysis.loadClass("org.apache.catalina.loader.Extension");
        List<ValueBox> list = StringAnalysis.getArgumentExpressions("<java.lang.StringBuffer: java.lang.StringBuffer append(java.lang.String)>", 0);

        StringAnalysis sa = new StringAnalysis(list);
        for (ValueBox e : list) {
            Automaton a = sa.getAutomaton(e);
            if (a.isFinite()) {
                Iterator<String> si = a.getFiniteStrings().iterator();
                StringBuilder sb = new StringBuilder();
                while (si.hasNext()) {
                    sb.append((String) si.next());
                }
                System.out.println(sb.toString());
            } else if (a.complement().isEmpty()) {
                System.out.println(e.getValue());
            } else {
                System.out.println("common prefix:" + a.getCommonPrefix());
            }
        }

    }

I would be very appreciated for any help with the JSA tool or for a hint to another tool. My biggest issue with the regex the control flow structure around the string constant.

like image 841
Leonid Glanz Avatar asked Sep 01 '15 07:09

Leonid Glanz


People also ask

How do you check if a string matches a regex in java?

To check if a String matches a Pattern one should perform the following steps: Compile a String regular expression to a Pattern, using compile(String regex) API method of Pattern. Use matcher(CharSequence input) API method of Pattern to create a Matcher that will match the given String input against this pattern.

What regex gives all strings ending with B?

'. *b$' will match all strings ending with a 'b'.


1 Answers

I'm not aware of a tool which yields you a regex out of the box.

But since you have issues with the CFG I would recommend you to write a static analysis tailored to your problem. You could use a static analysis/bytecode framework like OPAL (Scala) or Soot (Java). You will find tutorials on each project page.

Once you set it up you can load the target jar. You should be able to leverage the control flow of the program then like in the following example:

1 public static void example(String unknown) {
2   String source = "hello";
3   if(Math.random() * 20 > 5){
4       source += "world";
5   } else {
6       source += "unknown";
7   }
8   source += unknown;
  }

If your analysis finds a String or StringBuilder which is initialized you can start to build your regular expression. Line number two for instance would bring your regex to "hello". If you meet a conditional in the control flow of your program you can analyze each path and combine them via an "|" later on.

Then branch: "world" (line 4)
Else branch: "unknown" (line 6)

This could be summarized at line 7 to (world)|(unknown) and append to the regex before the conditional.

If you encounter a variable you either can trace it back if you do an inter-procedural analysis or you have to use the wildcard operator ".*" otherwise.

Final regex: "hello((world)|(unknown)).*"

I hope that this leads you to your solution you want to achieve.

like image 96
M. Reif Avatar answered Oct 03 '22 12:10

M. Reif