I'm working on a project that displays song names and a link to that song by parsing a website using Jsoup. The only problem is, I can only get the first 10 elements that I want from that website because as you scroll down, the website generates more elements. The specific website I'm trying to parse is a music site called TrappedIO. You'll notice when viewing the website that when you scroll down, more song names and images appear. When I use inspect element in chrome, I see that as I scroll, it generates more of elements I'm trying to parse.
The CSS Path of what I'm parsing: #content > div.container > div > div:nth-child(index of element)
The problem is when I get this website with Jsoup using this method,
Document doc = Jsoup.connect(url).get();
HTML returned from Jsoup: Pastebin
Only the first 10 elements I want to parse are returned, along with all the other HTML. To be more specific, I'm parsing using Jsoup in an AsyncTask, then populating a ListView with the parsed data.
Any ideas? Any suggestions on how to load everything at once? Any response is very much appreciated, thanks.
Its quite simple, to get the next set of 10, just hit the following
http://trapped.io/?page=2
To generalize, just feed proper page number in page=PAGE_NUMBER
query parameter, you can get a set of 10 elements in that page.
Just a side note, there might be legal issue in scraping sites. I hope you've double checked its legally ok to do scraping on their site.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With