I'm trying to pull some webpage source code from a WebView in an Android app. I've managed, using this: http://lexandera.com/2009/01/extracting-html-from-a-webview/
plus this to make it work after KitKat:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.KITKAT) {
webView.evaluateJavascript(
"(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
new ValueCallback<String>() {
@Override
public void onReceiveValue(String html) {
outputViewer.setText(html);
}
});
}else{
webView.loadUrl("javascript:window.HTMLOUT.showHTML" +
"('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
}
Now, the problem is that the non-kitkat version returns exactly what I want. The KitKat version however returns an escaped version of the code, something like this:
"\u003Chtml>\u003Chead>\n\t\u003Cmeta charset=\"UTF-8\">\n\t\u003Cmeta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n\t\u003Clink rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\n\t\u003Clink rel=\"pingback\"
Is there a straight forward way to unescape that string on Android?
Mike
The unescape() function in JavaScript takes a string as a parameter and uses to decode that string encoded by the escape() function. The hexadecimal sequence in the string is replaced by the characters they represent when decoded via unescape().
Using the Escape Character ( \ ) We can use the backslash ( \ ) escape character to prevent JavaScript from interpreting a quote as the end of the string. The syntax of \' will always be a single quote, and the syntax of \" will always be a double quote, without any fear of breaking the string.
The escape() function is used to encode a string, making it safe for use in a URL. The unescape() function is used to decode an encoded string.
Unescape HTML Entities with a Text Area One way to unescape HTML entities is to put our escaped text in a text area. This will unescape the text, so we can return the unescaped text afterward by getting the text from the text area. We have an htmlDecode function that takes an input string as a parameter.
I had the same problem and it looks like it's java-escaped so since I'm already using apache commons lang this worked for me:
str = StringEscapeUtils.unescapeJava(str);
before
"\u003Chtml lang=\"en\">\u003Chead> \u003Cmeta content=\"width=device-width,minimum-scale=1.0\"...
after
"<html lang="en"><head> <meta content="width=device-width,minimum-scale=1.0"...
I took the code from:
Convert escaped Unicode character back to actual character
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With