Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to extract body contents using regexp [duplicate]

I have this code in a var.

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body anything="">
        content
    </body>

</html>

or

<html>

    <head>
        .
        .
        anything
        .
        .
    </head>

    <body>
        content
    </body>

</html>

result should be

content
like image 330
faressoft Avatar asked Sep 02 '10 14:09

faressoft


2 Answers

Note that the string-based answers supplied above should work in most cases. The one major advantage offered by a regex solution is that you can more easily provide for a case-insensitive match on the open/close body tags. If that is not a concern to you, then there's no major reason to use regex here.

And for the people who see HTML and regex together and throw a fit...Since you are not actually trying to parse HTML with this, it is something you can do with regular expressions. If, for some reason, content contained </body> then it would fail, but aside from that, you have a sufficiently specific scenario that regular expressions are capable of doing what you want:

const strVal = yourStringValue; //obviously, this line can be omitted - just assign your string to the name strVal or put your string var in the pattern.exec call below 
const pattern = /<body[^>]*>((.|[\n\r])*)<\/body>/im;
const array_matches = pattern.exec(strVal);

After the above executes, array_matches[1] will hold whatever came between the <body and </body> tags.

like image 137
Jeffrey Blake Avatar answered Sep 18 '22 23:09

Jeffrey Blake


var matched = XMLHttpRequest.responseText.match(/<body[^>]*>([\w|\W]*)<\/body>/im);
alert(matched[1]); 
like image 37
Catalin Enache Avatar answered Sep 20 '22 23:09

Catalin Enache