Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex matching imports of a class

Tags:

java

regex

I've been trying to write a regex to match the imports of a class. Let the class be

import static org.junit.Assert.*;
import org.
       package.
       Test;
import mypackage.mystuff;

The output should be [org.junit.Assert.*, org.package.Test, mypackage.mystuff]. I've been struggling with the line breaks and with regular expressions in general since I'm not that experienced with them. This is my current attempt:

((?<=\bimport\s)\s*([^\s]+ )*([a-z.A-Z0-9]+.(?=;))) 
like image 556
voskart Avatar asked Mar 22 '16 12:03

voskart


4 Answers

This (almost) suits your needs:

(?<=import (?:static )?+)[^;]+

Regular expression visualization

Debuggex Demo

Almost because the matches include the new lines if any (e.g. in your org.package.Test declaration). This should be handled afterwards:

Pattern pattern = Pattern.compile("(?<=import (?:static )?+)[^;]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
  String match = matcher.group().replaceAll("\\s+", "");
  // do something with match
}

In Java, \s matches [ \t\n\x0B\f\r]. Have a look at possessive quantifiers as well to understand the ?+ quantifier.

like image 97
sp00m Avatar answered Nov 07 '22 07:11

sp00m


This regex should work for all kinds of import statements and should not match invalid statements: import\p{javaIdentifierIgnorable}*\p{javaWhitespace}+(?:static\p{javaIdentifierIgnorable}*\p{javaWhitespace}+)?(\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*|(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*)+(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*)?))\p{javaWhitespace}*;

It's extensively using Java's categories, e.g. \p{javaWhitespace} calls Character.isWhitespace:

Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname.

Still not readable? Guessed so. That's why I tried to express it with Java code (REGEX):

public class ImportMatching {
    static final String IMPORTS = "import\n" +
        "java.io.IOException;\n" +
        "import java.nio.file.Files;\n" +
        "import   java  .   nio .  file.   Path;\n" +
        "import  java.nio.file.Paths\n" +
        ";import java.util.ArrayList;\n" +
        "import  static   java.util. List.*;\n" +
        "import java.util.List. *;\n" +
        "import java.\n" +
        "       util.\n" +
        "       List;\n" +
        " import java.util.regex.Matcher;import java.util.regex.Pattern\n" +
        "         ;\n" +
        "import mypackage.mystuff;\n" +
        "import mypackage.*;";
    static final String WS = "\\p{javaWhitespace}";
    static final String IG = "\\p{javaIdentifierIgnorable}";
    static final String ID = "\\p{javaJavaIdentifierStart}" + multiple(charClass("\\p{javaJavaIdentifierPart}" + IG));
    static final String DOT = multiple(WS) + "\\." + multiple(WS);
    static final String WC = "\\*";
    static final String REGEX = "import" + multiple(IG) + atLeastOnce(WS) +
        optional(nonCapturingGroup("static" + multiple(IG) + atLeastOnce(WS))) +
        group(
            ID +
            nonCapturingGroup(
                or(
                    DOT + WC,
                    atLeastOnce(nonCapturingGroup(DOT + ID)) + optional(nonCapturingGroup(DOT + WC))
                )
            )
        ) +
        multiple(WS) + ';';

    public static void main(String[] args) {
        final List<String> imports = getImports(IMPORTS);
        System.out.printf("Matches: %d%n", imports.size());
        imports.stream().forEach(System.out::println);
    }

    static List<String> getImports(String javaSource) {
        Pattern pattern = Pattern.compile(REGEX);
        Matcher matcher = pattern.matcher(javaSource);
        List<String> imports = new ArrayList<>();
        while(matcher.find()) {
            imports.add(matcher.group(1).replaceAll(charClass(WS + IG), ""));
        }
        return imports;
    }

    static String nonCapturingGroup(String regex) {
        return group("?:" + regex);
    }

    static String or(String option1, String option2) {
        return option1 + '|' + option2;
    }

    static String atLeastOnce(String regex) {
        return regex + '+';
    }

    static String optional(String regex) {
        return regex + '?';
    }

    static String multiple(String regex) {
        return regex + '*';
    }

    static String group(String regex) {
        return '(' + regex + ')';
    }

    static String charClass(String regex) {
        return '[' + regex + ']';
    }
}

I'm using one group for the package.Class part and then replacing any noise from the matches.

The test input is the following string (IMPORTS):

import
java.io.IOException;
import java.nio.file.Files;
import   java  .   nio .  file.   Path;
import  java.nio.file.Paths
;import java.util.ArrayList;
import  static   java.util. List.*;
import java.util.List. *;
import java.
       util.
       List;
 import java.util.regex.Matcher;import java.util.regex.Pattern
         ;
import mypackage.mystuff;
import mypackage.*;

The output:

Matches: 12
java.io.IOException
java.nio.file.Files
java.nio.file.Path
java.nio.file.Paths
java.util.ArrayList
java.util.List.*
java.util.List.*
java.util.List
java.util.regex.Matcher
java.util.regex.Pattern
mypackage.mystuff
mypackage.*
like image 30
xehpuk Avatar answered Nov 07 '22 07:11

xehpuk


You can use this regex:

(\w+\.\n*\s*)+([\w\*]+)(?=\;)

Escaped For Java:

(\\w+\\.\\n*\\s*)+([\\w\\*]+)(?=\\;)

enter image description here

Here is a regex tester link

like image 3
brso05 Avatar answered Nov 07 '22 09:11

brso05


Maybe this is what you are looking for?

(?<=\bimport)(\s*\R*\s*(?:[a-z0-9A-Z]+(\R|\s)+)*)((([a-zA-Z0-9]+\.)+)[a-zA-Z0-9]*\*?);

Source

like image 1
Oliver Junk Avatar answered Nov 07 '22 07:11

Oliver Junk