Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape all html between tags

Tags:

.net

regex

Can't seem to get this after hours of searching and trial and error. I'm trying to return the text between two html tags. The problem is that the text spans multiple lines. Here's an example. If someone could figure out a regex to match all content between the html tags.

<section id="mysection">
The text always starts on the line after the opening section tag.
It can be anything and even span multiple lines.
The closing tag always comes after the last line of text.
</section>

I've tried

Regex.Match(html, "<section id=\"mysection\">/s+(.*?)/s+</section>");

with some success but only worked if there was one line of text and not if there we're line breaks
and such. Using the example above, I want it to match "The text always starts on the line after the opening section tag. It can be anything and even span multiple lines. The closing tag always comes after the last line of text."

like image 280
user2325641 Avatar asked May 13 '26 08:05

user2325641


1 Answers

Use this:

Regex.Match(html, "\\<section id=\"mysection\"\\>(.*?)\\</section\\>", 
            RegexOptions.Singleline);

According to the documentation for RegexOptions.Singleline:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n).

Also, your angle brackets need to be escaped.

like image 155
Peter Rankin Avatar answered May 16 '26 02:05

Peter Rankin