For some reason, I want scan the content of java file(e.g. TagMatchingInterface.java) and fetch the class name(TagMatchingInterface) via regex, but my regex match the incorrect class name as there are some key words(class/interface/enum) hiding in the comment:
/**
*
* @author XXXX
* Introduction: A common interface that judges all kinds of algorithm tags.
* some other comment
*/
public class TagMatchingInterface
{
// content
public class InnerClazz{
// content
}
}
here is my pattern:
public Pattern CLASS_PATTERN = Pattern.compile("(?:public\\s)?(?:.*\\s)?(class|interface|enum)\\s+([$_a-zA-Z][$_a-zA-Z0-9]*)");
....
Matcher matcher = CLASS_PATTERN.matcher(content);
if (matcher.find()) {
System.out.println(match.group(2));
}
Any idea about my regex?
(?<=\n|\A)(?:public\s)?(class|interface|enum)\s([^\n\s]*)
This regex does the following:
public
or notclass
or interface
or enum
Note, I recommend using the global and case insensitive flags
Live Example
https://regex101.com/r/vR0iK3/1
Sample Text
/**
*
* @author XXXX
* Introduction: A common interface that judges all kinds of algorithm tags.
* some other comment
*/
public class TagMatchingInterface
{
// content
public class InnerClazz{
// content
}
}
Sample Matches
[0][0] = public class TagMatchingInterface
[0][1] = class
[0][2] = TagMatchingInterface
Capture groups:
NODE EXPLANATION
----------------------------------------------------------------------
(?<= look behind to see if there is:
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\A Start of the string
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
public 'public'
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
class 'class'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
interface 'interface'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
enum 'enum'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^\n\s]* any character except: '\n' (newline),
whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With