Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple lines on regex

Tags:

c#

regex

I have a html file from a website and I work with a regex to search for words and write these words to a document. I have this text:

<div class="scrollable " style="height: 200px;">
        <div>
            <p>CO-Schrank: nicht ben&ouml;tigtes ausbauen</p>
<p><strong>________________________________________________________________________</strong></p>

<p><strong>==&gt;&nbsp; wird nicht mehr ben&ouml;tigt!<br /></strong>z-B.: IUC</p>

<p>CO-Management in Gen. 2 implementieren</p>

<ol>
<li>Ausbau der PCI-Karten aus ZKA-PC in CO-PC- PC-Sys 02 TP 55, 56, 61 sind noch Profibus im ZKA-PC ==&gt; in CO-PC- PC-Sys 02 greift dann auf CO-PC f&uuml;r Datenaufzeichnung =&gt; Betrieb wieder aufnehmen</li>

<li>Ausbau der IUC</li>

<li>Testaufbau am CO-PC f&uuml;r den CO-Algorithmus und Datenspeicherung</li>

<li>Gen. 2 in CO-Management implementieren- pro Pr&uuml;fling 3 Min. (3 Min. x 48 HG x 10 Messungen)&nbsp;= 1440 Min. = 24 h- Messzeit 1-2 Min.</li>

</ol>


</div></div>

Now I want all the text in the <div>.... </div> too. I wrote this code but it is not working:

Match description = Regex.Match(line, "^<div class=\"scrollable \"^(.*?)$div>", 
    RegexOptions.Multiline);//multiple line

if (description.Success)
{
    //Console.WriteLine(status_id.Groups[1].Value);
    System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\\Webasto\\csv-"+zahl+".txt");
    file.WriteLine(id.Groups[1].Value + ";4;4;" + subject.Groups[1].Value + ";" + due_date.Groups[1].Value+";NULL;"+status_id.Groups[1].Value+";"//+assigned.Groups[1].Value
        +";"
        +priority.Groups[1].Value+";NULL;"+autor.Groups[1].Value+";0;"+created_on.Groups[1].Value+";"+start_date.Groups[1].Value+";"+done_ratio.Groups[1].Value+";"+hours.Groups[1].Value
        +";NULL;"+id.Groups[1].Value+";1;2;0;"+closed.Groups[1].Value+";");
    file.Close();
}
like image 253
Hans Sroeb Avatar asked May 25 '26 22:05

Hans Sroeb


1 Answers

You have a misunderstanding of what MultiLine means (I don't blame you, I have to think twice every time I use regex). MultiLine means that every line (ended with \n) is treated on its own.

You need SingleLine, which treats the whole string as if it was one line.

Side note: it is a bad idea to use Regex to parse HTML. Use a decent HTML parser instead.

like image 92
Patrick Hofman Avatar answered May 28 '26 11:05

Patrick Hofman