Getting Jsoup to support dynamically generated html by JavaScript

Tags:

right now I'm working on a webcrawler. This one should parse some specific sites and give me an output into an xml-file. Up to this point, it's no problem. The Crawler works and you can customize it realy quickly via a cfg-file. I use Jsoup to parse the HTML-content.

I just added a few more sites and noticed that I got a huge problem with HTML-content that is created via JavaScript. Isn't there a way to make Jsoup supporting Javascript? Or at least get the full HTML-content I can see in my browser.

I already tried HtmlUnit, but this one didn't do well. It did not give me the content I would get in my browser.

Sincerly,

Ogofo

848

asked Sep 27 '12 15:09

Ogofo

1 Answers

Jsoup does not support javascript and it does not emulate a browser. Just forget about it if you're planning to execute Javascript. In my experience HtmlUnit, which is a headless browser, has given me the best results (always talking about Java frameworks).

One thing that worths trying in HtmlUnit is changing the BrowserVersion (Chrome / InternetEplorer / FireFox) while creating the WebClient instance. Some sites react in a different way and sometimes just changing that value might give you the results you expect to get.

answered Nov 01 '22 01:11

Mosty Mostacho

Related questions
                            
                                Is it safe to use email encoders? Or how is it safest to show email address?
                            
                                Place JLabel on top of JLabel with image in
                            
                                Time unit used in Apache CXF methods
                            
                                Java - bytes and binary
                            
                                Actual JSON serialization benchmark for Java-APIs
                            
                                How do I retrieve multiple TXT records from DNS with Java?
                            
                                Why does Gson deserialize 1 to 1.0?
                            
                                Browser refresh by Ctrl+F5 in WebDriver using Java
                            
                                split not working correctly
                            
                                return a default timestamp object instead of null
                            
                                Hibernate Validator : Using if - else kind of logic in annotation
                            
                                Gradle: How to partition a task into sequentially executed actions
                            
                                How to export to a war file my maven project in eclipse
                            
                                how to write java Log file using the logger api while using hadoop
                            
                                How to monitor Asynchronous requests in Java
                            
                                How to find classes without unit tests?
                            
                                Builder design pattern - No abstract class/interface
                            
                                Java Properties file. Problems with setProperty() method
                            
                                One DAO per entity - how to handle references?
                            
                                java.util.Map.put(key, value) - what if value equals existing value?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting Jsoup to support dynamically generated html by JavaScript

Tags:

java

javascript

html

jsoup

htmlunit

Ogofo

People also ask

1 Answers

Mosty Mostacho

Recent Activity

Donate For Us