Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to retrieve the *full* HTML page source of an iframe with Javascript?

I am trying to figure out how to retrieve the full (that means all data) HTML page source from an <iframe> whose src is from the same originating domain as the page that it is embedded on. I want the exact source code at any given time, which could be dynamic due to Javascript or php generating the <iframe> html output. This means AJAX calls like $.get() will not work for me as the page could have been modified via Javascript or generated uniquely based on the request time or mt_rand() in php. I have not been able to retrieve the exact <!DOCTYPE> declaration from my <iframe>.

I have been experimenting around and searching through Stack Overflow and have not found a solution that retrieves all of the page source including the <!DOCTYPE> declaration.

One of the answers in How do I get the entire page's HTML with jQuery? suggests that in order to retrieve the <!DOCTYPE> information, you need to construct this declaration manually, by retrieving the <iframe>'s document.doctype property and then adding all of the attributes to the <!DOCTYPE> declaration yourself. Is this really the only way to retrieve this information from the <iframe>'s HTML page source?

Here are some notable Stack Overflow posts that I have looked through and that this is not a duplicate of:

  • Javascript: Get current page CURRENT source
  • Get selected element's outer HTML
  • https://stackoverflow.com/questions/4612143/how-to-get-page-source-using-jquery
  • How do I get the entire page's HTML with jQuery?
  • Jquery: get all html source of a page but excluding some #ids
  • jQuery: Get HTML including the selector?

Here is some of my local test code that illustrates my best attempt so far, which only retrieves the data within and including the <iframe>'s <html> tag:

main.html

<html>
<head>
  <title>Testing with iframe</title>
  <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
  <script type="text/javascript">
  function test() {
    var doc = document.getElementById('iframe-source').contentWindow.document;
    var html = $('html', doc).clone().wrap('<p>').parent().html();
    $('#output').val(html);
  }
  </script>
</head>
<body>

<textarea id="output"></textarea>
<iframe id="iframe-source" src="iframe.html" onload="javascript:test()"></iframe>

</body>
</html>


iframe.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html class="html-tag-class">
  <head class="head-tag-class">
    <title>iframe Testing</title>
  </head>
  <body class="body-tag-class">
    <h2>Testing header tag</h2>
    <p>This is <strong>very</strong> exciting</p>
  </body>
</html>


And here is a screenshot of these files run together in Google Chrome version 27.0.1453.110 m: iframe testing

Summary

As you can see, Google Chrome's Inspect element shows that within the <iframe> the <!DOCTYPE> declaration is present, so how can I retrieve this data with the page source? This question also applies to any other declarations or other tags that are not contained within the <html> tags.


Any help or advice on retrieving this full page source code via Javascript would be greatly appreciated.

like image 664
Aiias Avatar asked Jun 09 '13 04:06

Aiias


1 Answers

Here is a way to build it from the doctype, seems to work for html 4 and 5, I didn't test for stuff like svg.

<html>
<head>
  <title>Testing with iframe</title>
  <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
  <script type="text/javascript">
  function test() {
    var d = document.getElementById('iframe-source').contentWindow.document;
    var t = d.docType;
    $('#output').val(
        "<!DOCTYPE "+t.name+ 
          (t.publicId? (" PUBLIC "+JSON.stringify(t.publicId)+" ") : "")+
          (t.systemId? JSON.stringify(t.systemId) :"")+
          ">\n" + d.documentElement.outerHTML  );
  }
  </script>
</head>
<body>

<textarea id="output"></textarea>
<iframe id="iframe-source" src="iframe.html" onload="test()"></iframe>

</body>
</html>

this also uses HTML.outerHTML to make sure you get any attribs on the documentElement.

like image 71
dandavis Avatar answered Oct 13 '22 01:10

dandavis