Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you unescape URLs in Java?

When I read the xml through a URL's InputStream, and then cut out everything except the url, I get "http://cliveg.bu.edu/people/sganguly/player/%20Rang%20De%20Basanti%20-%20Tu%20Bin%20Bataye.mp3".

As you can see, there are a lot of "%20"s.

I want the url to be unescaped.

Is there any way to do this in Java, without using a third-party library?

like image 836
Penchant Avatar asked Mar 08 '09 16:03

Penchant


People also ask

How do you encode a URL in Java?

Encode the URLprivate String encodeValue(String value) { return URLEncoder. encode(value, StandardCharsets. UTF_8. toString()); } @Test public void givenRequestParam_whenUTF8Scheme_thenEncode() throws Exception { Map<String, String> requestParams = new HashMap<>(); requestParams.

What is URL decoder in Java?

public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.

How do you escape special characters in a URL in Java?

Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server.

How do I encode a URL?

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.


2 Answers

This is not unescaped XML, this is URL encoded text. Looks to me like you want to use the following on the URL strings.

URLDecoder.decode(url); 

This will give you the correct text. The result of decoding the like you provided is this.

http://cliveg.bu.edu/people/sganguly/player/ Rang De Basanti - Tu Bin Bataye.mp3 

The %20 is an escaped space character. To get the above I used the URLDecoder object.

like image 143
ng. Avatar answered Sep 18 '22 13:09

ng.


Starting from Java 11 use

URLDecoder.decode(url, StandardCharsets.UTF_8).

for Java 7/8/9 use URLDecoder.decode(url, "UTF-8").

URLDecoder.decode(String s) has been deprecated since Java 5

Regarding the chosen encoding:

Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.

like image 42
freedev Avatar answered Sep 21 '22 13:09

freedev