Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to capture multiline regex between two tags?

Tags:

regex

What's the best way to select all text between 2 comment tags? E.g.

<!-- Text 1
     Text 2
     Text 3
-->

<\!--.* will capture <!-- Text 1 but not Text 2, Text 3, or -->

Edit As per Basti M's answer, <\!--((?:.*\n)*)--> will select everything between the first <!-- and last -->. I.e. lines 1 to 11 below.

How would I modify this to select just lines within separate tags? i.e. lines 1 to 4:

1 <!-- Text 1 //First
2      Text 2
3      Text 3
4 -->
5
6 More text
7 
8 <!-- Text 4
9      Text 5
10     Text 6
11 -->         //Last
like image 216
alias51 Avatar asked Jan 12 '14 21:01

alias51


3 Answers

Depending on your underlying engine use the s-modifier (and add --> at the end of your expression.
This will make the . match newline-characters aswell.

If the s-flag is not available to you, you may use

<!--((?:.*\r?\n?)*)-->

Explanation:

<!--         #start of comment
  (           #start of capturing group
    (?:       #start of non-capturing group
      .*\r?\n? #match every character including a line-break
    )*        #end of non-capturing group, repeated between zero and unlimited times
  )           #end of capturing group
-->           #end of comment

To match multiple comment blocks you can use

/(?:<!--((?:.*?\r?\n?)*)-->)+/g

Demo @ Regex101

like image 125
KeyNone Avatar answered Oct 25 '22 10:10

KeyNone


Use the s modifier to match new lines. E.g.:

/<!--(.*)-->/s

Demo: http://regex101.com/r/lH0jK9

like image 23
scrowler Avatar answered Oct 25 '22 09:10

scrowler


Regex is not the right tool to parse html or xml, use a proper parser, I use xpath here :

$ cat file.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<test>
<!-- Text 1
     Text 2
     Text 3
-->
</test>

The test :

$ xmllint --xpath '/test/comment()' file.xml
<!-- Text 1
     Text 2
     Text 3
-->

If you parse html, use the --html switch.

like image 38
Gilles Quenot Avatar answered Oct 25 '22 11:10

Gilles Quenot