Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all html between two elements

Problem:
Extract all html between two headers including the headers html. The header text is known, but not the formatting, tag name, etc. They are not within the same parent and might (well, almost for sure) have sub children within it's own children).

To clarify: headers could be inside a <h1> or <div> or any other tag. They may also be surrounded by <b>, <i>, <font> or more <div> tags. The key is: the only text within the element is the header text.

The tools I have available are: C# 3.0 utilizing a WebBrowser control, or Jquery/Js.

I've taken the Jquery route, traversing the DOM, but I've ran into the issue of children and adding them appropriately. Here is the code so far:

function getAllBetween(firstEl,lastEl) {
    var collection = new Array(); // Collection of Elements
    var fefound =false;
    $('body').find('*').each(function(){
        var curEl = $(this);
        if($(curEl).text() == firstEl) 
            fefound=true;
        if($(curEl).text() == lastEl) 
            return false;

        // need something to add children children
        // otherwise we get <table></table><tbody></tbody><tr></tr> etc
        if (fefound)
            collection.push(curEl);
    });
    var div = document.createElement("DIV");
    for (var i=0,len=collection.length;i<len;i++){
        $(div).append(collection[i]);
    }
    return($(div).html());
}

Should I be continueing down this road? With some sort of recursive function checking/handling children, or would a whole new approach be better suited?

For the sake of testing, here is some sample markup:

<body>
<div>
<div>Start</div>
<table><tbody><tr><td>Oops</td></tr></tbody></table>
</div>
<div>
<div>End</div>
</div>
</body>

Any suggestions or thoughts are greatly appreciated!

like image 511
WSkid Avatar asked Nov 14 '22 06:11

WSkid


1 Answers

My thought is a regex, something along the lines of

.*<(?<tag>.+)>Start</\1>(?<found_data>.+)<\1>End</\1>.*

should get you everything between the Start and end div tags.

like image 143
dutt Avatar answered Dec 15 '22 04:12

dutt