I am trying to figure out how to retrieve the full (that means all data) HTML page source from an <iframe>
whose src
is from the same originating domain as the page that it is embedded on. I want the exact source code at any given time, which could be dynamic due to Javascript or php generating the <iframe>
html output. This means AJAX calls like $.get()
will not work for me as the page could have been modified via Javascript or generated uniquely based on the request time or mt_rand()
in php. I have not been able to retrieve the exact <!DOCTYPE>
declaration from my <iframe>
.
I have been experimenting around and searching through Stack Overflow and have not found a solution that retrieves all of the page source including the <!DOCTYPE>
declaration.
One of the answers in How do I get the entire page's HTML with jQuery? suggests that in order to retrieve the <!DOCTYPE>
information, you need to construct this declaration manually, by retrieving the <iframe>
's document.doctype
property and then adding all of the attributes to the <!DOCTYPE>
declaration yourself. Is this really the only way to retrieve this information from the <iframe>
's HTML page source?
Here are some notable Stack Overflow posts that I have looked through and that this is not a duplicate of:
Here is some of my local test code that illustrates my best attempt so far, which only retrieves the data within and including the <iframe>
's <html>
tag:
main.html
<html>
<head>
<title>Testing with iframe</title>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script type="text/javascript">
function test() {
var doc = document.getElementById('iframe-source').contentWindow.document;
var html = $('html', doc).clone().wrap('<p>').parent().html();
$('#output').val(html);
}
</script>
</head>
<body>
<textarea id="output"></textarea>
<iframe id="iframe-source" src="iframe.html" onload="javascript:test()"></iframe>
</body>
</html>
iframe.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html class="html-tag-class">
<head class="head-tag-class">
<title>iframe Testing</title>
</head>
<body class="body-tag-class">
<h2>Testing header tag</h2>
<p>This is <strong>very</strong> exciting</p>
</body>
</html>
And here is a screenshot of these files run together in Google Chrome version 27.0.1453.110 m:
As you can see, Google Chrome's Inspect element
shows that within the <iframe>
the <!DOCTYPE>
declaration is present, so how can I retrieve this data with the page source? This question also applies to any other declarations or other tags that are not contained within the <html>
tags.
Any help or advice on retrieving this full page source code via Javascript would be greatly appreciated.
Here is a way to build it from the doctype, seems to work for html 4 and 5, I didn't test for stuff like svg.
<html>
<head>
<title>Testing with iframe</title>
<script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
<script type="text/javascript">
function test() {
var d = document.getElementById('iframe-source').contentWindow.document;
var t = d.docType;
$('#output').val(
"<!DOCTYPE "+t.name+
(t.publicId? (" PUBLIC "+JSON.stringify(t.publicId)+" ") : "")+
(t.systemId? JSON.stringify(t.systemId) :"")+
">\n" + d.documentElement.outerHTML );
}
</script>
</head>
<body>
<textarea id="output"></textarea>
<iframe id="iframe-source" src="iframe.html" onload="test()"></iframe>
</body>
</html>
this also uses HTML.outerHTML to make sure you get any attribs on the documentElement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With