Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to fetch the correct java class name

Tags:

java

regex

For some reason, I want scan the content of java file(e.g. TagMatchingInterface.java) and fetch the class name(TagMatchingInterface) via regex, but my regex match the incorrect class name as there are some key words(class/interface/enum) hiding in the comment:

/**
 *
 * @author XXXX
 * Introduction: A common interface that judges all kinds of algorithm tags.
 * some other comment
 */
public class TagMatchingInterface 
{
  // content
  public class InnerClazz{
    // content
  }
}

here is my pattern:

public Pattern CLASS_PATTERN = Pattern.compile("(?:public\\s)?(?:.*\\s)?(class|interface|enum)\\s+([$_a-zA-Z][$_a-zA-Z0-9]*)");
....
Matcher matcher = CLASS_PATTERN.matcher(content);
if (matcher.find()) {
   System.out.println(match.group(2));
}

Any idea about my regex?

like image 371
vash_ace Avatar asked Mar 12 '23 05:03

vash_ace


1 Answers

Description

(?<=\n|\A)(?:public\s)?(class|interface|enum)\s([^\n\s]*)

Regular expression visualization

This regex does the following:

  • allow the string to start with public or not
  • be a class or interface or enum
  • capture the name

Note, I recommend using the global and case insensitive flags

Example

Live Example

https://regex101.com/r/vR0iK3/1

Sample Text

/**
 *
 * @author XXXX
 * Introduction: A common interface that judges all kinds of algorithm tags.
 * some other comment
 */
public class TagMatchingInterface 
{
  // content
  public class InnerClazz{
    // content
  }
}

Sample Matches

[0][0] = public class TagMatchingInterface
[0][1] = class
[0][2] = TagMatchingInterface

Capture groups:

  • group 0 gets the entire match
  • group 1 gets the class
  • group 2 gets the name

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (?<=                     look behind to see if there is:
----------------------------------------------------------------------
    \n                       '\n' (newline)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    \A                        Start of the string
----------------------------------------------------------------------
  )                        end of look-behind
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    public                   'public'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    class                    'class'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    interface                'interface'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    enum                     'enum'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [^\n\s]*                 any character except: '\n' (newline),
                             whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
like image 193
Ro Yo Mi Avatar answered Mar 30 '23 06:03

Ro Yo Mi