Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uri and WebView classes parsing URLs containing backslashes in authority (host or user information) differently

When using the URIs

String myUri = "https://evil.example.com\\.good.example.org/";
// or
String myUri = "https://evil.example.com\\@good.example.org/";

in Java on Android, the backslash in the host or user information of the authority part of the URI causes a mismatch between how Android’s android.net.Uri and android.webkit.WebView parse the URI with regard to its host.

  • The Uri class (and cURL) treat evil.example.com\.good.example.org (first example) or even good.example.org (second example) as the URI’s host.
  • The WebView class (and Firefox and Chrome) treat evil.example.com (both examples) as the URI’s host.

Is this known, expected or correct behavior? Do the two classes simply follow different standards?

Looking at the specification, it seems neither RFC 2396 nor RFC 3986 allows for a backslash in the user information or authority.

Is there any workaround to ensure a consistent behavior here, especially for validation purposes? Does the following patch look reasonable (to be used with WebView and for general correctness)?

Uri myParsedUri = Uri.parse(myUri);

if ((myParsedUri.getHost() == null || !myParsedUri.getHost().contains("\\")) && (myParsedUri.getUserInfo() == null || !myParsedUri.getUserInfo().contains("\\"))) {
    // valid URI
}
else {
    // invalid URI
}

One possible flaw is that this workaround may not catch all the cases that cause inconsistent hosts to be parsed. Do you know of anything else (apart from a backslash) that causes a mismatch between the two classes?

like image 375
caw Avatar asked Nov 08 '22 06:11

caw


1 Answers

It's known that Android WebView 4.4 converts some URLs, in the linked issue are some steps described how to prevent that. From your question is not completely clear if your need is based in that issue or something else.

You can mask the backslashes and other signs with there according number in the character-table. In URLs the the number is written in hexademcimal.

Hexadecimal: 5C
Dezimal: 92
Sign: \

The code is the prepended with a % for each sign in the URL, your code looks like this after replacement:

String myUri = "https://evil.example.com%5C%5C.good.example.org/";
// or
String myUri = "https://evil.example.com%5C%[email protected]/";

it might be required still to add a slash to separate domain and path:

String myUri = "https://evil.example.com/%5C%5C.good.example.org/";
// or
String myUri = "https://evil.example.com/%5C%[email protected]/";

Is it possible that the backslashes never shall be used for network-communication at all but serve as escaping for some procedures like regular expressions or for output in JavaScript (Json) or some other steps?

Bonus ;-)
Below is a php-script that prints a table for most UTF-8-signs with the corresponding Numbers in hex and dec. (it still should be wrapped in an html-template including css perhaps):

<?php
    $chs = array('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');
    $chs2 = $chs;
    $chs3 = $chs;
    $chs4 = $chs;
    foreach ($chs as $ch){
        foreach ($chs2 as $ch2){    
            foreach ($chs3 as $ch3){
                foreach ($chs4 as $ch4){
                    echo '<tr>';
                    echo '<td>';
                    echo $ch.$ch2.$ch3.$ch4;
                    echo '</td>';
                    echo '<td>';
                    echo hexdec($ch.$ch2.$ch3.$ch4);
                    echo '</td>';
                    echo '<td>';
                    echo '&#x'.$ch.$ch2.$ch3.$ch4.';';
                    echo '</td>';
                    echo '</tr>';
                }
            }
        }
    }
?>
like image 150
David Avatar answered Nov 15 '22 11:11

David