How to print page source with Selenium

Tags:

python

selenium

I have the below code, which searches Twitter and scrolls through the infinite scrolling. The line 'print data' is not working for me though. Any ideas?

# Import Selenium stuff
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException

# Import other needed packages
import sys
import unittest, time, re

# Call up Firefox, do the Twitter search, click the "All" link and start paging
class Sel(unittest.TestCase):
def setUp(self):
    self.driver = webdriver.Firefox()
    self.driver.implicitly_wait(30)
    self.base_url = "https://twitter.com"
    self.verificationErrors = []
    self.accept_next_alert = True
def test_sel(self):
    driver = self.driver
    delay = 3
    driver.get(self.base_url + "/search?q=storstrut&src=typd")
    driver.find_element_by_link_text("All").click()
    for i in range(1,100):
        self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(4)
    html_source = driver.page_source
    data = html_source.encode('utf-8')
    print data


if __name__ == "__main__":
    unittest.main()

718

asked Feb 18 '15 20:02

DIGSUM

1 Answers

you have a lot of unused code and weird imports, but you are on the right track.

Here is a simplified version, with comments explaining.

import time
from selenium import webdriver


# launch Firefox
driver = webdriver.Firefox()

# load Twitter page
driver.get("https://twitter.com/search?q=storstrut&src=typd")

# the following javascript scrolls down the entire page body.  Since Twitter
# uses "inifinite scrolling", more content will be added to the bottom of the
# DOM as you scroll... since it is in the loop, it will scroll down up to 100 
# times.
for _ in range(100):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# print all of the page source that was loaded
print driver.page_source.encode("utf-8")

# quit and close browser
driver.quit()

answered Sep 28 '22 00:09

Corey Goldberg

Related questions
                            
                                simple mapping partitions job in (py)spark
                            
                                How do you create model objects in a Django view?
                            
                                Python / Cython / Numpy optimization of np.nonzero
                            
                                How to not standarize target data in scikit learn regression
                            
                                In Python "requests" module, how do I check whether the server is down or 500?
                            
                                imresize in PIL/scipy.misc only works for uint8 images? any alternatives?
                            
                                SQL query fails when using pyodbc, but works in SQL
                            
                                Pandas: get second character of the string, from every row
                            
                                How to add values in keys in a dictionary inside a loop?
                            
                                urllib3 download a file using specified user agent
                            
                                Read .txt file line by line in Python
                            
                                can a python function call a global function with the same name?
                            
                                How to change the tab of ttk.Notebook
                            
                                Show icon or color in Gtk TreeView tree
                            
                                Extend python with C, return numpy array gives garbage
                            
                                compare values in two columns of data frame
                            
                                why range(0,10).remove(1) does not work?
                            
                                How can I conditionally add in a mixin to the current class on instantiation?
                            
                                "executable not specified" error in PyCharm
                            
                                Is there a point to setting __all__ and then using leading underscores anyway?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With