I am using below code to get HTML but i am not getting plain HTML, it contain non escapes character. I am using JSOUP parser which is not able to parse this HTML.
webview.evaluateJavascript(
                        "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
                        new ValueCallback<String>() {
                            @Override
                            public void onReceiveValue(String html) {
                            }
                        });
I am getting this html string from above code.
"\u003Chtml>\u003Chead>\n    \u003Cmeta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n    \u003Cmeta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n    \u003Clink rel=\"shortcut icon\" href=\"https://www.xyx.com/favicon.ico\" type=\"image/x-icon\">\n    \u003Clink rel=\"icon\" href=\"https://www.xyx.com/favicon.ico\" type=\"image/x-icon\">\n    \n    \u003Ctitle>Page Not Found! : BJSBuzz\u003C/title>\n\n    \u003C!-- \n\tOpen Source Social Network (Ossn)/script>\u003C/body>\u003C/html>"
                You should use JsonReader to parse the value:
webView.evaluateJavascript("(function() {return document.getElementsByTagName('html')[0].outerHTML;})();", new ValueCallback<String>() {
    @Override
    public void onReceiveValue(final String value) {
        JsonReader reader = new JsonReader(new StringReader(value));
        reader.setLenient(true);
        try {
            if(reader.peek() == JsonToken.STRING) {
                String domStr = reader.nextString();
                if(domStr != null) {
                    handleResponseSuccessByBody(domStr);
                }
            }
        } catch (IOException e) {
            // handle exception
        } finally {
            IoUtil.close(reader);
        }
}
});
try this
v=StringEscapeUtils.unescapeJavaScript(v.substring(1,v.length()-1));
unescapeJavaScript is from apache commons-lang
So many string processing for android webview, why...
The removeUTFCharacters method provided in the previous answer is not clean enough.There still remain stuffs like \".   
for remove the UTFCharacthers use this function:
 public static StringBuffer removeUTFCharacters(String data) {
        Pattern p = Pattern.compile("\\\\u(\\p{XDigit}{4})");
        Matcher m = p.matcher(data);
        StringBuffer buf = new StringBuffer(data.length());
        while (m.find()) {
            String ch = String.valueOf((char) Integer.parseInt(m.group(1), 16));
            m.appendReplacement(buf, Matcher.quoteReplacement(ch));
        }
        m.appendTail(buf);
        return buf;
    }
and call it inside the onReceiveValue(String html) like this:
@Override
public void onReceiveValue(String html) {
String result = removeUTFCharacters(html).toString();
}
You will obtain a string with clean html.
Bye, Alex
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With