Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx expression that will capture everything between two characters including multiline blocks

Tags:

regex

I want to capture all text & blocks of text between <% and %>.

For example:

<html>
<head>
<title>Title Here</title>
</head>
<body>
<% include("/path/to/include") %>
<h1>Test Template</h1>
<p>Variable: <% print(second_var) %></p>
<%

variable = value;

foreach(params here)
{
    code here
}

%>
<p><a href="/" title="Home">Home</a></p>
</body>
</html>

I have tried \<\%(.*)\%\> but that will capture everything including <h1>Test Template</h1> block as well.

like image 868
Lark Avatar asked Oct 22 '10 20:10

Lark


People also ask

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

What does \d mean in regex?

In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart. \d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ).

What is a capturing group regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

How do I capture a character in regex?

Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. (abc){3} matches abcabcabc.


2 Answers

Which regex engine are you using?

<%(.*?)%>

should work with the "dot matches newline" option enabled. If you don't know how to set that, try

<%([\s\S]*?)%>

or

(?s)<%(.*?)%>

No need to escape <, %, or > by the way.

like image 62
Tim Pietzcker Avatar answered Oct 12 '22 11:10

Tim Pietzcker


\<\%(.*?)\%\>. You need to use .*? to get non-greedy pattern matching.

EDIT To solve the multiline problem, you can't use the . wildcard, as it matches everything except newline. This option differs depending on your regular expressions engine. So, I can tell you what to do if you tell me your regex engine.

like image 24
Rafe Kettler Avatar answered Oct 12 '22 10:10

Rafe Kettler