Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex match everything between two string, spaning multiline

Tags:

regex

How do I regex match everything that is between two strings? The things between two strings span several lines and can contain all html characters too.

For example:

<p>something</p>

<!-- OPTIONAL -->

<p class="sdf"> some text</p>
<p> some other text</p>

<!-- OPTIONAL END -->

<p>The end</p>

I want to strip the whole optional part off. but the greedy any character match isn't doing what I wanted.. the pattern I'm using is

  • <!-- OPTIONAL -->.*<!-- OPTIONAL END -->
  • <!-- OPTIONAL -->(.*)<!-- OPTIONAL END -->
  • <!-- OPTIONAL -->(.*)\s+<!-- OPTIONAL END -->
  • (?=<!-- OPTIONAL -->)(.*)\s+<!-- OPTIONAL END -->

All of them match the first optional tag, if only the first part is given, but doesn't do well with complete lines.

Here's an example: http://regexr.com?352bk

Thanks

like image 605
LocustHorde Avatar asked May 30 '13 15:05

LocustHorde


People also ask

How do you match everything including newline RegEx?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

What is multiline mode in RegEx?

Multiline option, it matches either the newline character ( \n ) or the end of the input string. It does not, however, match the carriage return/line feed character combination.

What is multiline flag in RegEx?

The m flag indicates that a multiline input string should be treated as multiple lines. For example, if m is used, ^ and $ change from matching at only the start or end of the entire string to the start or end of any line within the string. You cannot change this property directly.

What is full match in RegEx?

The fullmatch() function returns a Match object if the whole string matches the search pattern of a regular expression, or None otherwise. The syntax of the fullmatch() function is as follows: re.fullmatch(pattern, string, flags=0)


2 Answers

playing with your example I think I found the answer, check this in your code:

<!-- OPTIONAL -->[\w\W]*<!-- OPTIONAL END -->

I'll hope this help

like image 148
Mauricio Damián Araoz Avatar answered Oct 07 '22 08:10

Mauricio Damián Araoz


Check the dotall checkbox in RegExr :)

Without the dotall flag (the s in /regex/s), a dot (.) won't match carriage returns.

You should use .*? instead of .* to lazy match the optional content (see the PLEASE DO NOT MATCH! sentence in the examples).

like image 43
sp00m Avatar answered Oct 07 '22 10:10

sp00m