I am using below code to get HTML but i am not getting plain HTML, it contain non escapes character. I am using JSOUP parser which is not able to parse this HTML.
webview.evaluateJavascript(
"(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
new ValueCallback<String>() {
@Override
public void onReceiveValue(String html) {
}
});
I am getting this html string from above code.
"\u003Chtml>\u003Chead>\n \u003Cmeta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n \u003Cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n \u003Clink rel=\"shortcut icon\" href=\"https://www.xyx.com/favicon.ico\" type=\"image/x-icon\">\n \u003Clink rel=\"icon\" href=\"https://www.xyx.com/favicon.ico\" type=\"image/x-icon\">\n \n \u003Ctitle>Page Not Found! : BJSBuzz\u003C/title>\n\n \u003C!-- \n\tOpen Source Social Network (Ossn)/script>\u003C/body>\u003C/html>"
You should use JsonReader to parse the value:
webView.evaluateJavascript("(function() {return document.getElementsByTagName('html')[0].outerHTML;})();", new ValueCallback<String>() {
@Override
public void onReceiveValue(final String value) {
JsonReader reader = new JsonReader(new StringReader(value));
reader.setLenient(true);
try {
if(reader.peek() == JsonToken.STRING) {
String domStr = reader.nextString();
if(domStr != null) {
handleResponseSuccessByBody(domStr);
}
}
} catch (IOException e) {
// handle exception
} finally {
IoUtil.close(reader);
}
}
});
try this
v=StringEscapeUtils.unescapeJavaScript(v.substring(1,v.length()-1));
unescapeJavaScript
is from apache commons-lang
So many string processing for android webview, why...
The removeUTFCharacters
method provided in the previous answer is not clean enough.There still remain stuffs like \"
.
for remove the UTFCharacthers use this function:
public static StringBuffer removeUTFCharacters(String data) {
Pattern p = Pattern.compile("\\\\u(\\p{XDigit}{4})");
Matcher m = p.matcher(data);
StringBuffer buf = new StringBuffer(data.length());
while (m.find()) {
String ch = String.valueOf((char) Integer.parseInt(m.group(1), 16));
m.appendReplacement(buf, Matcher.quoteReplacement(ch));
}
m.appendTail(buf);
return buf;
}
and call it inside the onReceiveValue(String html) like this:
@Override
public void onReceiveValue(String html) {
String result = removeUTFCharacters(html).toString();
}
You will obtain a string with clean html.
Bye, Alex
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With