Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse JavaScript with jsoup

In an HTML page, I want to pick the value of a javascript variable.
Below is the snippet of HTML page:

<input id="hidval" value="" type="hidden"> 
<form method="post" style="padding: 0px;margin: 0px;" name="profile" autocomplete="off">
<input name="pqRjnA" id="pqRjnA" value="" type="hidden">
<script type="text/javascript">
    key="pqRjnA";
</script>

My aim is to read the value of variable key from this page using jsoup.
Is it possible with jsoup? If yes then how?

like image 272
Ravi Joshi Avatar asked Feb 15 '13 22:02

Ravi Joshi


People also ask

Can jsoup parse JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.

What is jsoup parse?

Description. The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.

What does jsoup do in Java?

What It Is. jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.


1 Answers

Since jsoup isn't a javascript library you have two ways to solve this:

A. Use a javascript library

  • Pro:

    • Full Javascript support
  • Con:

    • Additional libraray / dependencies

B. Use Jsoup + manual parsing

  • Pro:

    • No extra libraries required
    • Enough for simple tasks
  • Con:

    • Not as flexible as a javascript library

Here's an example how to get the key with jsoupand some "manual" code:

Document doc = ...
Element script = doc.select("script").first(); // Get the script part


Pattern p = Pattern.compile("(?is)key=\"(.+?)\""); // Regex for the value of the key
Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'key' part


while( m.find() )
{
    System.out.println(m.group()); // the whole key ('key = value')
    System.out.println(m.group(1)); // value only
}

Output (using your html part):

key="pqRjnA"
pqRjnA
like image 198
ollo Avatar answered Sep 28 '22 19:09

ollo