Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup - read from an html url where code is hidden

Tags:

java

html

jsoup

I'm trying to using the jsoup library to get 'li' from a website. The problem is this:

  • If I open the source of website with CTRL+U(which is the same read by jsoup), the 'ul' tag is hidden.

hidden result

  • if I open the code with the fuction "inspect code" of google chrome,'li' are shown.

shown result

Posting the code is not necessary; I only want to know how can access to this 'li' with jsoup or other java free libraries, Whereas in the source code(and through jsoup) these informations are hidden.

The site is https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco and try to search something(i.e. Tachi)

like image 785
Fidelis Avatar asked Feb 12 '26 22:02

Fidelis


1 Answers

The problem with Jsoup is that it won't handle scripts. It is just getting html as it is before the AJAX code is executed.

You can use something like HtmlUnit, which is basically a GUI-less browser. So, it can handle scripts.

You can try something like this after getting the HtmlUnit library:

    String url = "https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco?search=Tachi";
    try(final WebClient webClient = new WebClient()) {
        final HtmlPage page = webClient.getPage(url);
        final HtmlUnorderedList list = page.getHtmlElementById("ul_farm_results");
        System.out.println(list.asText());
    }

I couldn't check the code as the website's certificate is improperly configured and I didn't want to import it's certificate. You may want to take a look at this to resolve the certificate errors.

like image 51
Shakhar Avatar answered Feb 14 '26 17:02

Shakhar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!