Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get contents of BODY without DOCTYPE, HTML, HEAD and BODY tags

Tags:

php

What I am trying to do is include an HTML file within a PHP system (not a problem) but that HTML file also needs to be usable on its own, for various reasons, so I need to know how I can strip the doctype, html, head and body tags in the context of the PHP include, if that's possible.

I'm not particularly good at PHP (doh!) so my searches of the php manual and on the web hasn't made me figure this out. Meaning that any help or reading tips, or both, are much appreciated.

like image 717
enrico pax Avatar asked Jun 29 '12 00:06

enrico pax


People also ask

Can HTML work without head tag?

In HTML 5 it is not mandatory to include a <head> tag inside the HTML document but in previous versions(4.0. 1) it was mandatory to include it. The tags like <title>, <meta> or <link> which are generally contained inside head will also work fine without the <head> tag or outside the <head> tag.

Does HTML need head and body?

An HTML 4 document is composed of three parts: a line containing HTML version information, a declarative header section (delimited by the HEAD element), a body, which contains the document's actual content.

Which tag in HTML does not use body tag?

Pay attention to the notes: HTML5-compliant browsers automatically create a <head> element if its tags are omitted in the markup. As per the MDN on <body>: The HTML <body> element represents the content of an HTML document. There is only one <body> element in a document.

How are the contents of body and head of an HTML document different?

A HTML file has headers and a "body" (payload) — just like a HTTP request. The <body> encapsulates the contents of the document, while the <head> part contains meta elements, i.e., information about the contents. This is (typically) title, encoding, author, styling etc.


2 Answers

Since the substr() method seemed to be too much for some to swallow, here is a DOM parser method:

$d = new DOMDocument;
$mock = new DOMDocument;
$d->loadHTML(file_get_contents('/path/to/my.html'));
$body = $d->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child){
    $mock->appendChild($mock->importNode($child, true));
}

echo $mock->saveHTML();

http://codepad.org/MQVQ3XQP

Anybody wish to see that "other one", see the revisions.

like image 136
Jared Farrish Avatar answered Nov 15 '22 21:11

Jared Farrish


$site = file_get_contents("http://www.google.com/");

preg_match("/<body[^>]*>(.*?)<\/body>/is", $site, $matches);

echo($matches[1]);
like image 36
Patrick Avatar answered Nov 15 '22 21:11

Patrick