Selenium download full html page

Tags:

1 Answers

Users add more content to the page (from previous dates) by clicking the <div onclick="control.moreData()" id="moreLink">More...</div> element at the bottom of the page.

So to get your desired content, you could use Selenium to click the id="moreLink" element or execute some JavaScript to call control.moreData(); in a loop.

For example, if you want to get all content as far back as Friday, February 15, 2013 (it looks like a string of this format exists for every date, for loaded content) your python might look something like this:

content = browser.page_source
desired_content_is_loaded = false;
while (desired_content_is_loaded == false):
     if not "Friday, February 15, 2013" in content:
          sel.run_script("control.moreData();")
          content = browser.page_source
     else:
          desired_content_is_loaded = true;

EDIT:

If you disable JavaScript in your browser and reload the page, you will see that there is no "trends" content at all. What that tells me, is that the those items are loaded dynamically. Meaning, they are not part of the HTML document which is downloaded when you open the page. Selenium's .get() waits for the HTML document to load, but not for all JS to complete. There's no telling if async JS will complete before or after any other event. It completes when it's ready, and could be different every time. That would explain why you might sometimes get all, some, or none of that content when you call browser.page_source because it depends how fast async JS happens to be working at that moment.

So, after opening the page, you might try waiting a few seconds before getting the source - giving the JS which loads the content time to complete.

browser.get(googleURL)
time.sleep(3)
content = browser.page_source

150

answered Sep 30 '22 11:09

Dingredient

Related questions
                            
                                Is Python `list.extend(iterator)` guaranteed to be lazy?
                            
                                What are the differences between a `classmethod` and a metaclass method?
                            
                                Ruby on Rails vs. Django [closed]
                            
                                Hosting Mercurial with IIS 6
                            
                                How can you dispatch on request method in Django URLpatterns?
                            
                                python module layout
                            
                                Removing sinusoidal noise with Butterworth filter
                            
                                Kwargs in Django
                            
                                Which web frameworks support Python3 (PEP3333 - WSGI 1.0.1 compliance) [closed]
                            
                                Good documentation/tutorial for python webkit [closed]
                            
                                Docstring tag for 'yield' keyword
                            
                                SQLAlchemy StaleDataError on deleting items inserted via ORM sqlalchemy.orm.exc.StaleDataError
                            
                                Current best method for wrapping Modern Fortran code with Python
                            
                                How to decorate a generator in python
                            
                                Python unittest - asserting dictionary with lists
                            
                                How to predict tides using harmonic constants
                            
                                Passing data from javascript into Flask
                            
                                Python: Respond to Command Line Prompts
                            
                                Is it possible to salt and or hash HOTP/TOTP secret on the server?
                            
                                Python async and CPU-bound tasks?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Selenium download full html page

Tags:

python

selenium

user2392965

People also ask

1 Answers

Dingredient

Recent Activity

Donate For Us