Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I parse through an infinite scrolling page (ex. Wallbase.cc/search/sky) with Python?

Not sure if there's anything with Mechanize or BeautifulSoup that could help. Any suggestions would be greatly appreciated!

like image 241
Rev3rb Avatar asked Nov 16 '11 17:11

Rev3rb


People also ask

How do you scrape data from Infinite scrolling pages in Python?

First, we visit Scraping Infinite Scrolling Pages Exercise, then open web dev tools of our browser to help us inspect the web traffic of the website. If you are new to web dev tools , just Right-click on any page element and select Inspect Element. . As you can see, a panel shows up for you to inspect the web page.

How do I deal with infinite scroll?

Infinite scrolling will require two key parts. One part will be a check for the window scroll position and the height of the window to determine if a user has reached the bottom of the page. Another part will be handling the request for additional information to display.


1 Answers

Mechanize and Beautiful soup can't inteface with the javascript used for the infinite scroll.

Selenium can.

Additionally if you were to view the ajax requests when you use the infinite scroll you would see a post request to http://wallbase.cc/search/160 with the request data:

query:sky
board:123
res_opt:eqeq
res:0x0
aspect:0
nsfw_sfw:1
nsfw_sketchy:0
nsfw_nsfw:0
thpp:32
orderby:relevance
orderby_opt:desc

160 corresponds to the image range so the request before it was wallbase.cc/searc/128.

like image 179
dm03514 Avatar answered Oct 29 '22 00:10

dm03514