Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with iFrames in Selenium

I'm trying to scrape a webpage using Selenium (in Python) that is almost entirely Javascript.
For instance, this is the body of the page:

<body class="bodyLoading">
<!-- this is required for GWT history support -->
<iframe id="__gwt_historyFrame" role="presentation" width="0" height="0" tabindex="-1" title="empty" style="position:absolute;width:0;height:0;border:0" src="javascript:''">  </iframe>
<!-- For printing window contents  -->
<iframe id="__printingFrame" role="presentation" width="0" height="0" tabindex="-1" title="empty" style="width:0;height:0;border:0;"   />


<!-- TODO : RECOMMENDED if your web app will not function without JavaScript enabled -->
<noscript>
<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1px solid red; padding: 4px; font-family: sans-serif">
 Your web browser must have JavaScript enabled in order for
 Regulations.gov to display correctly.
</div>
</noscript>
</body>

For some reason, selenium (using the Firefox engine) does not evaluate the javascript on this page. If I use the get_html_source function, it just returns the html above, not the JavaScript imported HTML that I can see in my browser (and in the Selenium browser). And, unfortunately, I can't figure out the src attibute from the iFrame just says javascript: which I can't figure out.

Any thoughts on how to make sure Selenium process this iFrame?

like image 900
tchaymore Avatar asked Jun 15 '11 20:06

tchaymore


People also ask

Can iFrames be automated using Selenium?

In such a case, you can open the browser dev tools and checkout the required iframe by searching the keyword 'iframe' under 'Elements' tab of dev tools. With Selenium based test automation, you can also get the count of iframes on a particular web page with a below code snippet : int iFrameSize = driver.

Are iFrames bad for performance?

iFrames tend to neither help nor hurt your search engine ranking. For this reason, it's best to refrain from using iFrames on main pages that you want to rank high in search engine results. Instead, fill high-priority pages with useful, unique content and save iFrames for other pages.


1 Answers

The iframes are separate documents, so you won't get their contents included in the HTML code for the main page; you have to read them separately.

You can do this using Selenium's select_frame function.

You can access a frame via its name, CSS selector, xpath reference, etc, as with other elements.

When you select the frame you change Selenium's context, so you can then access the frame's contents as if it was the current page.

If you have frames within frames, you can continue this process down through the frame tree.

Obviously, you need a method of returning back up the frame path. Selenium provides this, by allowing you to use the same select_frame function, with a parameter of either relative=up to move the context to the parent of the current frame, or relative=top to move to the main page in the browser.

So using this function you can navigate around the frames in a page.

You can't access them all at once; only one frame can be in context at once, so you'll never be able to make a single get_html_source call and get all the frames' contents at once, but you can navigate around frames in the page within your Selenium script and get the HTML source for each frame separately.

Hope that helps.

like image 55
Spudley Avatar answered Sep 26 '22 03:09

Spudley