scraping a website that requires you to scroll down

Question

I am trying to scrape this website here:

However, it requires that I scroll down in order to collect additional data. I have no idea how to scroll down using Beautiful soup or python. Does anybody here know how?

The code is a bit of a mess but here it is.

import scrapy
from scrapy.selector import Selector
from testtest.items import TesttestItem
import datetime
from selenium import webdriver
from bs4 import BeautifulSoup
from HTMLParser import HTMLParser
import re
import time

class MLStripper(HTMLParser):


class MySpider(scrapy.Spider):
        name = "A1Locker"

        def strip_tags(html):
            s = MLStripper()
            s.feed(html)
            return s.get_data()

     allowed_domains = ['https://www.a1lockerrental.com']
    start_urls = ['http://www.a1lockerrental.com/self-storage/mo/st-
 louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
 category=all']

     def parse(self, response):

                 url='http://www.a1lockerrental.com/self-storage/mo/st-
louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?
category=Small'
                driver = webdriver.Firefox()
                driver.get(url)
                html = driver.page_source
                soup = BeautifulSoup(html, 'html.parser')
        url2='http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-
meramec-bottom-rd-facility/unit-sizes-prices#/units?category=Medium'
        driver2 = webdriver.Firefox()
                driver2.get(url2)
                html2 = driver.page_source
                soup2 = BeautifulSoup(html2, 'html.parser')                
                #soup.append(soup2)
                #print soup
        items = []
        inside = "Indoor"
                outside = "Outdoor"
        inside_units = ["5 x 5", "5 x 10"]
        outside_units = ["10 x 15","5 x 15", "8 x 10","10 x 10","10 x 
20","10 x 25","10 x 30"]
        sizeTagz = soup.findAll('span',{"class":"sss-unit-size"})
        sizeTagz2 = soup2.findAll('span',{"class":"sss-unit-size"})
        #print soup.findAll('span',{"class":"sss-unit-size"})



        rateTagz = soup.findAll('p',{"class":"unit-special-offer"})


        specialTagz = soup.findAll('span',{"class":"unit-special-offer"})
        typesTagz = soup.findAll('div',{"class":"unit-info"},)

        rateTagz2 = soup2.findAll('p',{"class":"unit-special-offer"})


        specialTagz2 = soup2.findAll('span',{"class":"unit-special-offer"})
        typesTagz2 = soup2.findAll('div',{"class":"unit-info"},)
        yield {'date': datetime.datetime.now().strftime("%m-%d-%y"),
                'name': "A1Locker"
                   }
        size = []
        for n in range(len(sizeTagz)):
                    print len(rateTagz)
                    print len(typesTagz)

                    if "Outside" in (typesTagz[n]).get_text():



                            size.append(re.findall(r'\d+',
 (sizeTagz[n]).get_text()))
                            size.append(re.findall(r'\d+',
 (sizeTagz2[n]).get_text()))
                            print "logic hit"
                for i in range(len(size)):
                        yield {
                    #soup.findAll('p',{"class":"icon-bg"})
                    #'name': soup.find('strong', {'class':'high'}).text

                    'size': size[i]
                    #"special": (specialTagz[n]).get_text(),
                    #"rate": re.findall(r'\d+',(rateTagz[n]).get_text()),
                    #"size": i.css(".sss-unit-size::text").extract(),
                    #"types": "Outside"

    }
            driver.close()

The desired output of the code is to have it display the data collected from this webpage: http://www.a1lockerrental.com/self-storage/mo/st-louis/4427-meramec-bottom-rd-facility/unit-sizes-prices#/units?category=all

To do so would require being able to scroll down to view the rest of the data. At least that is how it would be done in my mind.

Thanks, DM123

ImInfinite313 · Accepted Answer

There is a webdriver function that provides this capability. BeautifulSoup doesn't do anything besides parse the site.

Check this out: http://webdriver.io/api/utility/scroll.html

Greenstick · Answer

The website you're trying to scrape is loading content dynamically using JavaScript. Unfortunately, many web scrapers, such as beautiful soup, cannot execute JavaScript on their own. There are a number of options, however, many in the form of headless browsers. A classic one is PhantomJS, but it may be worth taking a look at this great list of options on GitHub, some of which may play nicely with beautiful soup, such as Selenium.

Keeping Selenium in mind, the answer to this Stackoverflow question may also help.

scraping a website that requires you to scroll down

Tags:

python

javascript

dynamic

beautifulsoup

bs4

Daveyman123

2 Answers

ImInfinite313

Greenstick

Recent Activity

Donate For Us

scraping a website that requires you to scroll down

Tags:

python

javascript

dynamic

beautifulsoup

bs4

Daveyman123

2 Answers

ImInfinite313

Greenstick

Related questions

Recent Activity

Donate For Us