Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Page content is loaded with JavaScript and Jsoup doesn't see it

One block on the page is filled with content by JavaScript and after loading page with Jsoup there is none of that inforamtion. Is there a way to get also JavaScript generated content when parsing page with Jsoup?

Can't paste page code here, since it is too long: http://pastebin.com/qw4Rfqgw

Here's element which content I need: <div id='tags_list'></div>

I need to get this information in Java. Preferably using Jsoup. Element is field with help of JavaScript:

<div id="tags_list">     <a href="/tagsc0t20099.html" style="font-size:14;">разведчик</a>     <a href="/tagsc0t1879.html" style="font-size:14;">Sr</a>     <a href="/tagsc0t3140.html" style="font-size:14;">стратегический</a> </div> 

Java code:

import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements;  import java.io.IOException;  public class Test {     public static void main( String[] args )     {         try         {             Document Doc = Jsoup.connect( "http://www.bestreferat.ru/referat-32558.html" ).get();             Elements Tags = Doc.select( "#tags_list a" );              for ( Element Tag : Tags )             {                 System.out.println( Tag.text() );             }         }         catch ( IOException e )         {             e.printStackTrace();         }     } } 
like image 410
Eugene Avatar asked Sep 20 '11 17:09

Eugene


1 Answers

JSoup is an HTML parser, not some kind of embedded browser engine. This means that it's completely unaware of any content that is added to the DOM by Javascript after the initial page load.

To get access to that type of content you will need an embedded browser component, there are a number of discussions on SO regarding that kind of component, eg Is there a way to embed a browser in Java?

like image 82
fvu Avatar answered Sep 22 '22 23:09

fvu