I've been trying to write a regex to match the imports of a class. Let the class be
import static org.junit.Assert.*;
import org.
package.
Test;
import mypackage.mystuff;
The output should be [org.junit.Assert.*, org.package.Test, mypackage.mystuff]. I've been struggling with the line breaks and with regular expressions in general since I'm not that experienced with them. This is my current attempt:
((?<=\bimport\s)\s*([^\s]+ )*([a-z.A-Z0-9]+.(?=;)))
This (almost) suits your needs:
(?<=import (?:static )?+)[^;]+
Debuggex Demo
Almost because the matches include the new lines if any (e.g. in your org.package.Test declaration). This should be handled afterwards:
Pattern pattern = Pattern.compile("(?<=import (?:static )?+)[^;]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
String match = matcher.group().replaceAll("\\s+", "");
// do something with match
}
In Java, \s
matches [ \t\n\x0B\f\r]
. Have a look at possessive quantifiers as well to understand the ?+
quantifier.
This regex should work for all kinds of import statements and should not match invalid statements:
import\p{javaIdentifierIgnorable}*\p{javaWhitespace}+(?:static\p{javaIdentifierIgnorable}*\p{javaWhitespace}+)?(\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*|(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*)+(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*)?))\p{javaWhitespace}*;
It's extensively using Java's categories, e.g. \p{javaWhitespace}
calls Character.isWhitespace:
Categories that behave like the java.lang.Character boolean
ismethodname
methods (except for the deprecated ones) are available through the same\p{prop}
syntax where the specified property has the namejavamethodname
.
Still not readable? Guessed so. That's why I tried to express it with Java code (REGEX
):
public class ImportMatching {
static final String IMPORTS = "import\n" +
"java.io.IOException;\n" +
"import java.nio.file.Files;\n" +
"import java . nio . file. Path;\n" +
"import java.nio.file.Paths\n" +
";import java.util.ArrayList;\n" +
"import static java.util. List.*;\n" +
"import java.util.List. *;\n" +
"import java.\n" +
" util.\n" +
" List;\n" +
" import java.util.regex.Matcher;import java.util.regex.Pattern\n" +
" ;\n" +
"import mypackage.mystuff;\n" +
"import mypackage.*;";
static final String WS = "\\p{javaWhitespace}";
static final String IG = "\\p{javaIdentifierIgnorable}";
static final String ID = "\\p{javaJavaIdentifierStart}" + multiple(charClass("\\p{javaJavaIdentifierPart}" + IG));
static final String DOT = multiple(WS) + "\\." + multiple(WS);
static final String WC = "\\*";
static final String REGEX = "import" + multiple(IG) + atLeastOnce(WS) +
optional(nonCapturingGroup("static" + multiple(IG) + atLeastOnce(WS))) +
group(
ID +
nonCapturingGroup(
or(
DOT + WC,
atLeastOnce(nonCapturingGroup(DOT + ID)) + optional(nonCapturingGroup(DOT + WC))
)
)
) +
multiple(WS) + ';';
public static void main(String[] args) {
final List<String> imports = getImports(IMPORTS);
System.out.printf("Matches: %d%n", imports.size());
imports.stream().forEach(System.out::println);
}
static List<String> getImports(String javaSource) {
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(javaSource);
List<String> imports = new ArrayList<>();
while(matcher.find()) {
imports.add(matcher.group(1).replaceAll(charClass(WS + IG), ""));
}
return imports;
}
static String nonCapturingGroup(String regex) {
return group("?:" + regex);
}
static String or(String option1, String option2) {
return option1 + '|' + option2;
}
static String atLeastOnce(String regex) {
return regex + '+';
}
static String optional(String regex) {
return regex + '?';
}
static String multiple(String regex) {
return regex + '*';
}
static String group(String regex) {
return '(' + regex + ')';
}
static String charClass(String regex) {
return '[' + regex + ']';
}
}
I'm using one group for the package.Class
part and then replacing any noise from the matches.
The test input is the following string (IMPORTS
):
import
java.io.IOException;
import java.nio.file.Files;
import java . nio . file. Path;
import java.nio.file.Paths
;import java.util.ArrayList;
import static java.util. List.*;
import java.util.List. *;
import java.
util.
List;
import java.util.regex.Matcher;import java.util.regex.Pattern
;
import mypackage.mystuff;
import mypackage.*;
The output:
Matches: 12 java.io.IOException java.nio.file.Files java.nio.file.Path java.nio.file.Paths java.util.ArrayList java.util.List.* java.util.List.* java.util.List java.util.regex.Matcher java.util.regex.Pattern mypackage.mystuff mypackage.*
You can use this regex:
(\w+\.\n*\s*)+([\w\*]+)(?=\;)
Escaped For Java:
(\\w+\\.\\n*\\s*)+([\\w\\*]+)(?=\\;)
Here is a regex tester link
Maybe this is what you are looking for?
(?<=\bimport)(\s*\R*\s*(?:[a-z0-9A-Z]+(\R|\s)+)*)((([a-zA-Z0-9]+\.)+)[a-zA-Z0-9]*\*?);
Source
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With