I am trying to download a file from a server using System.Web. It actually works, but some links give me trouble. The links look like this:
http://cdn.somesite.com/r1KH3Z%2FaMY6kLQ9Y4nVxYtlfrcewvKO9HLTCUBjU8IBAYnA3vzE1LGrkqMrR9Nh3jTMVFZzC7mxMBeNK5uY3nx5K0MjUaegM3crVpFNGk6a6TW6NJ3hnlvFuaugE65SQ4yM5754BM%2BLagqYvwvLAhG3DKU9SGUI54UAq3dwMDU%2BMl9lUO18hJF3OtzKiQfrC/the_file.ext
The code looks basically like this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(link);
WebResponse response = request.getResponse();
getResponse() always throws an exception (Error 400 Bad Request). However, I know the link works because I can download the file with Firefox without problems.
I also tried to decode the link with Uri.UnescapeDataString(link), but that link wont even work in Firefox.
Other links work perfectly fine this way.. just these won't work.
Okay, i found something out using wireshark:
If i open the link using Firefox, this is sent:
&ME3@"dM*PNyAo PA:]GET /r1KH3Z%2FaMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs4QgbudzcrJivrAaOTYkEnozqmdoSCCY8yb1i22YtEAV/epd_outpost_12adb.flv HTTP/1.1
Host: cdn.somesite.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
I think only the first line is the problem, because WebRequest.Create(link) decodes the url:
&MEz.@!dM/nP9@~P>.GET /r1KH3Z/aMY6kLQ9Y4nVxYp5DyNc49t5kJBybvjbcsJJZ0IUJBtBWCgri3zfTERQught6S8ws1a%2BCo0RS5w3KTmbL7i5yytRpn2QELEPUXZTGYWbAg5eyGO2yIIbmGOcFP41WdrFRFcfk4hAIyZ7rs6Mmh1EsQQ4vJVYUwtbLBDNx9AwCHlWDfzfSWIHzaaIo/epd_outpost_12adb.flv HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0
Host: cdn.somesite.com
( %2F is replaced with / )
I found out that the Uri class decodes the url automatically: Uri uri = new Uri(link); //link is not decoded Debug.WriteLine(uri.ToString()); //link is decoded here.
How can I prevent this?
Thanks in advance for your help.
By default, the Uri
class will not allow an escaped /
character (%2f
) in a URI (even though this appears to be legal in my reading of RFC 3986).
Uri uri = new Uri("http://example.com/embed%2fded");
Console.WriteLine(uri.AbsoluteUri); // prints: http://example.com/embed/ded
(Note: don't use Uri.ToString to print URIs.)
According to the bug report for this issue on Microsoft Connect, this behaviour is by design, but you can work around it by adding the following to your app.config or web.config file:
<uri>
<schemeSettings>
<add name="http" genericUriParserOptions="DontUnescapePathDotsAndSlashes" />
</schemeSettings>
</uri>
(Since WebRequest.Create(string)
just delegates to WebRequest.Create(Uri)
, you would need to use this workaround no matter which method you call.)
This has now changed in .NET 4.5. By default you can now use escaped slashes. I posted more info on this (including screenshots) in the comments here: GETting a URL with an url-encoded slash
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With