Is regex the right tool to find a line of HTML?

Question

I have a PHP script that pulls some content off of a server, but the problem is that the line on which the content is changes every day, so I can't just pull a specific line. However, the content is contained within a div that has a unique id. Is it possible (and is it the best way) for regex to search for this unique id and then pass the line of which it's on back to my script?

Example:

HTML file:

<html><head><title>Example</title></head>
<body>
<div id="Alpha"> Blah blah blah </div>
<div id="Beta"> Blah Blah Blah </div>
</body>
</html>

So let's say that I'm looking for the line with an opening div tag with an id of alpha. The code should return 3, because on the third line is the div with the id of alpha.

beggs · Accepted Answer

At the risk of providing more up-votes for Jeff who has already crossed the mountains of madness... see here

The argument rages back and forth, but... it's is a simple one-off or little used script you are writing then sure use regex, if it's more complex and needs to be reliable with little future tweaking then I'd suggest using an HTML parser. HTML is a nasty often non-regular beast to tame. Use the right tool for the job... maybe in your case it's regex, or maybe its a full blown parser.

NawaMan · Answer

Generally, NO. But if you are sure that the div will always be one line or there is not another div inside it, you can use it without problem. Something like /<div id=\"mydivid\">(.*?)</div>/ or something similar.

Otherwise, DOMDocument would be a more sane way.

EDIT See from your HTML example. My answer would be "YES". RegEx is a very good tool for this.

I assume that you have the HTML as a continuous text not as lines (which will be slightly different). I also assume that you want the line number more that the line content.

Here is a rought PHP code to extract it. (just to give some idea)

$HTML =
"<html><head><title>Example</title></head>
<body>
<div id=\"Alpha\"> Blah blah blah </div>
<div id=\"Beta\"> Blah Blah Blah </div>
</body>
</html>";

$ID = "Alpha";

function GetLineOfDIV($HTML, $ID) {
    $RegEx_Alpha = '/
(<div id="'.$ID.'">.*?</div>)
/m';
    $Index       = preg_match($RegEx_Alpha, $HTML, $Match, PREG_OFFSET_CAPTURE);
    $Match       = $Match[1]; // Only the one in '(...)'
    if ($Match == "")
        return -1;

    //$MatchStr    = $Match[0]; Since you do not want it, so we comment it out.
    $MatchOffset = $Match[1];

    $StartLines = preg_split("/
/", $HTML, -1, PREG_SPLIT_OFFSET_CAPTURE);
    foreach($StartLines as $I => $StartLine) {
        $LineOffset = $StartLine[1];
        if ($MatchOffset <= $LineOffset)
            return $I + 1;
    }
    return count($StartLines);
}

echo GetLineOfDIV($HTML, $ID);

I hope I give you some idea.

Is regex the right tool to find a line of HTML?

Tags:

html

regex

php

waiwai933

2 Answers

beggs

NawaMan

Recent Activity

Donate For Us

Is regex the right tool to find a line of HTML?

Tags:

html

regex

php

waiwai933

2 Answers

beggs

NawaMan

Related questions

Recent Activity

Donate For Us