Scrape a dynamic website

Question

What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.

--Edit-- For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.

Adam Davis · Accepted Answer

This is a difficult problem because you either have to reverse engineer the javascript on a per-site basis, or implement a javascript engine and run the scripts (which has its own difficulties and pitfalls).

It's a heavy weight solution, but I've seen people doing this with greasemonkey scripts - allow Firefox to render everything and run the javascript, and then scrape the elements. You can even initiate user actions on the page if needed.

-Adam

Scrape a dynamic website

Tags:

python

ajax

beautifulsoup

screen-scraping

Colin Barnes

1 Answers

Adam Davis

Recent Activity

Donate For Us

Scrape a dynamic website

Tags:

python

ajax

beautifulsoup

screen-scraping

Colin Barnes

1 Answers

Adam Davis

Related questions

Recent Activity

Donate For Us