I am trying to parse a web page, which contains some JS. Till now I am using Jsoup to parse html in Java, which is working as expected. But I am unable to parse the JavaScript. Below is the snippet of the HTML page-
<script type="text/javascript">
var element = document.createElement("input");
element.setAttribute("type", "hidden");
element.setAttribute("value", "");
element.setAttribute("name", "AzPwXPs");
element.setAttribute("id", "AzPwXPs");
var foo = document.getElementById("dnipb");
foo.appendChild(element);
var element1 = document.createElement("input");
element1.setAttribute("type", "hidden");
element1.setAttribute("value", "6D6AB8AECC9B28235F1DE39D879537E1");
element1.setAttribute("name", "ZLZWNK");
element1.setAttribute("id", "ZLZWNK");
foo.appendChild(element1);
</script>
I want to read both the values with their name/id. So that after parsing I can get following results-
AzPwXPs=
ZLZWNK=6D6AB8AECC9B28235F1DE39D879537E1
How to parse in this situation?
I have stumbled upon this question few times when searching for the solution to parse pages with JavaScript but the solution provided is not perfect. I have found pure Java solution to the problem by using JBrowserDriver and JSoup to parse JavaScript manipulated page.
Simple example:
// JBrowserDriver part
JBrowserDriver driver = new JBrowserDriver(Settings
.builder().
timezone(Timezone.EUROPE_ATHENS).build());
driver.get(FETCH_URL);
String loadedPage = driver.getPageSource();
// JSoup parsing part
Document document = Jsoup.parse(loadedPage);
Elements elements = document.select("#nav-console span.data");
log.info("Found element count: {}", elements.size());
driver.quit();
I already had the same situation to find url's in css files.
Put the javascript in a string and a apply Regular expressions
Pattern p = Pattern.compile("url\\(\\s*(['" + '"' + "]?+)(.*?)\\1\\s*\\)"); //expression
Matcher m = p.matcher(content);
while (m.find()) {
String urlFound = m.group();
}
Regards, Hugo Pedrosa
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With