I want to scrape a table from a website with a table that looks like this;
<table class="table table-hover data-table sort display">
<thead>
<tr>
<th class="Column1">
</th>
<th class="Column2">
</th>
</tr>
</thead>
<tbody>
<tr ng-repeat="item in filteredList | orderBy:columnToOrder:reverse">
<td>{{item.Col1}}</td>
<td>{{item.Col2}}</td>
</tr>
</tbody>
</table>
It seems that this website is built using some javascript framework that retrieves the table content from the backend through web services.
The problem is how can we scrape table data if the data is not in numerical format? The code above have the content enclosed in {{ }}
. Does this make the website unscrapable? Any solution? Thank you.
I am using python and beautifulsoup4.
Usually when there is JS content BeautifulSoup is not the tool. I use selenium. Try this and see if the HTML you are getting is scrapable:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(url)
driver.set_window_position(0, 0)
driver.set_window_size(100000, 200000)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5) # wait to load
# now print the response
print driver.page_source
At this point, you can use BeautifulSoup to scrape the data out of driver.page_source
. Note: you will need to install selenium and Firefox
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With