How to scrape dynamic webpages by Python

[What I'm trying to do]

Scrape the webpage below for used car data.
http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1

[Issue]

To scrape the entire pages. In the url above, only first 30 items are shown. Those could be scraped by the code below which I wrote. Links to other pages are displayed like 1 2 3... but the link addresses seems to be in Javascript. I googled for useful information but couldn't find any.

Click to copy

from bs4 import BeautifulSoup
import urllib.request

html = urllib.request.urlopen("http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1")

soup = BeautifulSoup(html, "lxml")
total_cars = soup.find(class_="change change_01").find('em').string
tmp = soup.find(class_="change change_01").find_all('span')
car_start, car_end = tmp[0].string, tmp[1].string

# get urls to car detail pages
car_urls = []
heading_inners = soup.find_all(class_="heading_inner")
for heading_inner in heading_inners:
    href = heading_inner.find('h4').find('a').get('href')
    car_urls.append('http://www.goo-net.com' + href)

for url in car_urls:
    html = urllib.request.urlopen(url)
    soup = BeautifulSoup(html, "lxml")
    #title
    print(soup.find(class_='hdBlockTop').find('p', class_='tit').string)
    #price of car itself
    print(soup.find(class_='price1').string)
    #price of car including tax
    print(soup.find(class_='price2').string)

    tds = soup.find(class_='subData').find_all('td')
    # year
    print(tds[0].string)
    # distance
    print(tds[1].string)
    # displacement
    print(tds[2].string)
    # inspection
    print(tds[3].string)

[What I'd like to know]

How to scrape the entire pages. I prefer to use BeautifulSoup4 (Python). But if that is not the appropriate tool, please show me other ones.

[My environment]

Windows 8.1
Python 3.5
PyDev (Eclipse)
BeautifulSoup4

Any guidance would be appreciated. Thank you.

879

asked Nov 19 '15 05:11

dixhom

2 Answers

you can use selenium like below sample:

Click to copy

from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://example.com')
element = driver.find_element_by_class_name("yourClassName") #or find by text or etc
element.click()

164

answered Oct 19 '22 17:10

ahmad valipour

The python module splinter may be a good starting point. It calls an external browser (such as Firefox) and access the browser's DOM rather than dealing with HTML only.

answered Oct 19 '22 16:10

ChrisGuest

Related questions
                            
                                Find keys for values that appear more than once
                            
                                Django slug and id as URL redirect
                            
                                How to open and present raw binary data in Python?
                            
                                How to use marshmallow to serialize a custom sqlalchemy field?
                            
                                getting only positive number from a list that containing heterogeneous data type item in python 3
                            
                                Search min value within a list of tuples
                            
                                Assign unique id to columns pandas data frame
                            
                                Conditional Substitution of values in pandas dataframe columns
                            
                                string.format() with {} inside string as string [duplicate]
                            
                                Using odo to migrate data to SQL
                            
                                Django: skip system check when running custom command
                            
                                Flask, not all arguments converted during string formatting
                            
                                Pythonic way to generate string rotations
                            
                                Python intersection of 2 lists of dictionaries
                            
                                How to get the text from a checkbutton in python ? (Tkinter)
                            
                                Reading PNG with PIL in Python
                            
                                Seaborn ticklabels are being truncated
                            
                                Create heatmap in python matplotlib with x and y labels from dict with {tuple:float} format
                            
                                Trouble deleting certain nested JSON objects in python
                            
                                Libxml2 installation onto Mac

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to scrape dynamic webpages by Python

Tags:

python

html

beautifulsoup

web-scraping

scrape