Using python with selenium to scrape dynamic web pages

Question

On the site, there are a couple of links at the top labeled 1, 2, 3, and next. If a link labeled by a number is pressed, it dynamically loads in some data into a content div. If next is pressed, it goes to a page with labels 4, 5, 6, next and the data for page 4 is shown.

I want to scrape the data from the content div for all links pressed (I don't know how many there are, it just shows 3 at a time and next)

Please give an example of how to do it. For instance, consider the site www.cnet.com.

Please guide me to download the series of pages using selenium and parse them to handle with beautiful soup on my own.

jfs · Accepted Answer

General layout (not tested):

#!/usr/bin/env python
from contextlib import closing
from selenium.webdriver import Firefox # pip install selenium

url = "http://example.com"

# use firefox to get page with javascript generated content
with closing(Firefox()) as browser:
    n = 1
    while n < 10:
        browser.get(url) # load page
        link = browser.find_element_by_link_text(str(n))
        while link:
           browser.get(link.get_attribute("href")) # get individual 1,2,3,4 pages
           #### save(browser.page_source)
           browser.back() # return to page that has 1,2,3,next -like links
           n += 1
           link = browser.find_element_by_link_text(str(n))

        link = browser.find_element_by_link_text("next")
        if not link: break
        url = link.get_attribute("href")

Using python with selenium to scrape dynamic web pages

Tags:

python

selenium

Koushik

1 Answers

jfs

Recent Activity

Donate For Us

Using python with selenium to scrape dynamic web pages

Tags:

python

selenium

Koushik

1 Answers

jfs

Related questions

Recent Activity

Donate For Us