Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetch contents(loaded through AJAX call) of a web page

I am a beginner to crawling. I have a requirement to fetch the posts and comments from a link. I want to automate this process. I considered using webcrawler and jsoup for this but was told that webcrawlers are mostly used for websites with greater depth.

Sample for a page: Jive community website

For this page, when I view the source of the page, I can see only the post and not the comments. Think this is because comments are fetched through an AJAX call to the server.

Hence, when I use jsoup, it doesn't fetch the comments.

So how can I automate the process of fetching posts and comments?

like image 715
Adarsh Konchady Avatar asked Dec 17 '13 11:12

Adarsh Konchady


People also ask

Is fetch an AJAX call?

Fetch is an interface for making AJAX calls in JavaScript. It is implemented widely by modern browsers and is used to call an API. Calling fetch returns a promise, with a Response object.

What does an AJAX call return?

ajax() (and various ajax shortcut methods) returns a jqXHR object, which is a superset of the browser's native XMLHttpRequest object and implements inter alia the Promise interface. Read more about the jqXHR object here.

Can I use fetch instead of AJAX?

Fetch is compatible with all recent browsers including Edge, but not with Internet Explorer. Therefore, if you are looking for maximum compatibility, you will continue to use Ajax to update a web page. If you also want to interact with the server, the WebSocket object is also more appropriate than fetch.


1 Answers

Jsoup is a html parser only. Unfortunately it's not possible to parse any javascript / ajax content, since jsoup can't execute those.

The solution: using a library which can handle Scripts.

Here are some examples i know:

  • HtmlUnit
  • Java Script Engine
  • Apache Commons BSF
  • Rhino

If such a library doesn't support parsing or selectors, you can at least use them to get Html out of the scripts (which then can be parsed by jsoup).

like image 200
ollo Avatar answered Sep 17 '22 22:09

ollo