How to scrape a website with table content that is retrieved by javascript?

Question

I want to scrape a table from a website with a table that looks like this;

<table class="table table-hover data-table sort display">
        <thead>
          <tr>
            <th class="Column1">
            </th>
            <th class="Column2">
            </th>
          </tr>
        </thead>
        <tbody>
          <tr ng-repeat="item in filteredList | orderBy:columnToOrder:reverse">
            <td>{{item.Col1}}</td>
            <td>{{item.Col2}}</td>
          </tr>
        </tbody>
</table>

It seems that this website is built using some javascript framework that retrieves the table content from the backend through web services.

The problem is how can we scrape table data if the data is not in numerical format? The code above have the content enclosed in {{ }}. Does this make the website unscrapable? Any solution? Thank you.

I am using python and beautifulsoup4.

PepperoniPizza · Accepted Answer

Usually when there is JS content BeautifulSoup is not the tool. I use selenium. Try this and see if the HTML you are getting is scrapable:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get(url)
driver.set_window_position(0, 0)
driver.set_window_size(100000, 200000)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5) # wait to load

# now print the response
print driver.page_source

At this point, you can use BeautifulSoup to scrape the data out of driver.page_source. Note: you will need to install selenium and Firefox

How to scrape a website with table content that is retrieved by javascript?

Tags:

python

beautifulsoup

web-scraping

guagay_wk

1 Answers

PepperoniPizza

Recent Activity

Donate For Us

How to scrape a website with table content that is retrieved by javascript?

Tags:

python

beautifulsoup

web-scraping

guagay_wk

1 Answers

PepperoniPizza

Related questions

Recent Activity

Donate For Us