Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a website with Jsoup that dynamically loads as the user scrolls

I'm working on a project that displays song names and a link to that song by parsing a website using Jsoup. The only problem is, I can only get the first 10 elements that I want from that website because as you scroll down, the website generates more elements. The specific website I'm trying to parse is a music site called TrappedIO. You'll notice when viewing the website that when you scroll down, more song names and images appear. When I use inspect element in chrome, I see that as I scroll, it generates more of elements I'm trying to parse.

The CSS Path of what I'm parsing: #content > div.container > div > div:nth-child(index of element)

The problem is when I get this website with Jsoup using this method,

Document doc = Jsoup.connect(url).get();

HTML returned from Jsoup: Pastebin

Only the first 10 elements I want to parse are returned, along with all the other HTML. To be more specific, I'm parsing using Jsoup in an AsyncTask, then populating a ListView with the parsed data.

Any ideas? Any suggestions on how to load everything at once? Any response is very much appreciated, thanks.

like image 348
Willter Avatar asked Oct 21 '22 03:10

Willter


1 Answers

Its quite simple, to get the next set of 10, just hit the following

http://trapped.io/?page=2

To generalize, just feed proper page number in page=PAGE_NUMBER query parameter, you can get a set of 10 elements in that page.

Edit:

Just a side note, there might be legal issue in scraping sites. I hope you've double checked its legally ok to do scraping on their site.

like image 189
bhargavg Avatar answered Oct 23 '22 10:10

bhargavg