When I read the xml through a URL's InputStream, and then cut out everything except the url, I get "http://cliveg.bu.edu/people/sganguly/player/%20Rang%20De%20Basanti%20-%20Tu%20Bin%20Bataye.mp3".
As you can see, there are a lot of "%20"s.
I want the url to be unescaped.
Is there any way to do this in Java, without using a third-party library?
Encode the URLprivate String encodeValue(String value) { return URLEncoder. encode(value, StandardCharsets. UTF_8. toString()); } @Test public void givenRequestParam_whenUTF8Scheme_thenEncode() throws Exception { Map<String, String> requestParams = new HashMap<>(); requestParams.
public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.
Use one of the multi-argument constructors that takes the URL components as separate strings, and it'll escape each component correctly according to that component's rules. The toASCIIString() method gives you a properly-escaped and encoded string that you can send to a server.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
This is not unescaped XML, this is URL encoded text. Looks to me like you want to use the following on the URL strings.
URLDecoder.decode(url);
This will give you the correct text. The result of decoding the like you provided is this.
http://cliveg.bu.edu/people/sganguly/player/ Rang De Basanti - Tu Bin Bataye.mp3
The %20 is an escaped space character. To get the above I used the URLDecoder object.
URLDecoder.decode(url, StandardCharsets.UTF_8)
.for Java 7/8/9 use URLDecoder.decode(url, "UTF-8")
.
URLDecoder.decode(String s)
has been deprecated since Java 5
Regarding the chosen encoding:
Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With