Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I Unescape a String returned from Javascript on Android?

I'm trying to pull some webpage source code from a WebView in an Android app. I've managed, using this: http://lexandera.com/2009/01/extracting-html-from-a-webview/

plus this to make it work after KitKat:

 if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.KITKAT) {
        webView.evaluateJavascript(
                "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
                new ValueCallback<String>() {
                    @Override
                    public void onReceiveValue(String html) {
                        outputViewer.setText(html);
                    }
                });
    }else{
        webView.loadUrl("javascript:window.HTMLOUT.showHTML" +
                "('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
    }

Now, the problem is that the non-kitkat version returns exactly what I want. The KitKat version however returns an escaped version of the code, something like this:

"\u003Chtml>\u003Chead>\n\t\u003Cmeta charset=\"UTF-8\">\n\t\u003Cmeta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n\t\u003Clink rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\n\t\u003Clink rel=\"pingback\" 

Is there a straight forward way to unescape that string on Android?

Mike

like image 667
MikeCoverUps Avatar asked Jan 05 '16 20:01

MikeCoverUps


People also ask

How do you unescape in JavaScript?

The unescape() function in JavaScript takes a string as a parameter and uses to decode that string encoded by the escape() function. The hexadecimal sequence in the string is replaced by the characters they represent when decoded via unescape().

How do you escape a string in JavaScript?

Using the Escape Character ( \ ) We can use the backslash ( \ ) escape character to prevent JavaScript from interpreting a quote as the end of the string. The syntax of \' will always be a single quote, and the syntax of \" will always be a double quote, without any fear of breaking the string.

What is unescape () and escape () functions?

The escape() function is used to encode a string, making it safe for use in a URL. The unescape() function is used to decode an encoded string.

How do you unescape a character in HTML?

Unescape HTML Entities with a Text Area One way to unescape HTML entities is to put our escaped text in a text area. This will unescape the text, so we can return the unescaped text afterward by getting the text from the text area. We have an htmlDecode function that takes an input string as a parameter.


1 Answers

I had the same problem and it looks like it's java-escaped so since I'm already using apache commons lang this worked for me:

str = StringEscapeUtils.unescapeJava(str);

before

"\u003Chtml lang=\"en\">\u003Chead> \u003Cmeta content=\"width=device-width,minimum-scale=1.0\"...

after

"<html lang="en"><head> <meta content="width=device-width,minimum-scale=1.0"...

I took the code from:

Convert escaped Unicode character back to actual character

like image 131
carrizo Avatar answered Sep 18 '22 06:09

carrizo