Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match a C-style multiline comment

Tags:

I have a string for e.g.

String src = "How are things today /* this is comment *\*/ and is your code  /*\* this is another comment */ working?" 

I want to remove /* this is comment *\*/ and /** this is another comment */ substrings from the src string.

I tried to use regex but failed due to less experience.

like image 548
hanumant Avatar asked Oct 22 '12 15:10

hanumant


People also ask

What is multiline mode in regex?

Multiline option, it matches either the newline character ( \n ) or the end of the input string. It does not, however, match the carriage return/line feed character combination.

What is multiline flag in regex?

The m flag indicates that a multiline input string should be treated as multiple lines. For example, if m is used, ^ and $ change from matching at only the start or end of the entire string to the start or end of any line within the string. You cannot change this property directly.

How do you match expressions in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do you begin a regular multi line comment?

A multiline comment starts with /* and then ends with */. A regular comment can be made by either // or a # in front of text, but it can only support one line. If you use a multiline comment, use can make a comment that's as many lines as you want. Posting to the forum is only allowed for members with active accounts.


2 Answers

The best multiline comment regex is an unrolled version of (?s)/\*.*?\*/ that looks like

String pat = "/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/"; 

See the regex demo and explanation at regex101.com.

In short,

  • /\* - match the comment start /*
  • [^*]*\*+ - match 0+ characters other than * followed with 1+ literal *
  • (?:[^/*][^*]*\*+)* - 0+ sequences of:
    • [^/*][^*]*\*+ - not a / or * (matched with [^/*]) followed with 0+ non-asterisk characters ([^*]*) followed with 1+ asterisks (\*+)
  • / - closing /

David's regex needs 26 steps to find the match in my example string, and my regex needs just 12 steps. With huge inputs, David's regex is likely to fail with a stack overflow issue or something similar because the .*? lazy dot matching is inefficient due to lazy pattern expansion at each location the regex engine performs, while my pattern matches linear chunks of text in one go.

like image 77
Wiktor Stribiżew Avatar answered Dec 15 '22 06:12

Wiktor Stribiżew


Try using this regex (Single line comments only):

String src ="How are things today /* this is comment */ and is your code /* this is another comment */ working?"; String result=src.replaceAll("/\\*.*?\\*/","");//single line comments System.out.println(result); 

REGEX explained:

Match the character "/" literally

Match the character "*" literally

"." Match any single character

"*?" Between zero and unlimited times, as few times as possible, expanding as needed (lazy)

Match the character "*" literally

Match the character "/" literally

Alternatively here is regex for single and multi-line comments by adding (?s):

//note the added \n which wont work with previous regex String src ="How are things today /* this\n is comment */ and is your code /* this is another comment */ working?"; String result=src.replaceAll("(?s)/\\*.*?\\*/",""); System.out.println(result); 

Reference:

  • https://www.regular-expressions.info/examplesprogrammer.html
like image 22
David Kroukamp Avatar answered Dec 15 '22 07:12

David Kroukamp