Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpClient returning special characters but nothing readable

I am trying to download a webpage using async and await and HttpClient, but am getting only a string full of special characters... Code is like..

static async void DownloadPageAsync(string url)
{
    HttpClient client = new HttpClient();
    client.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
    client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
    client.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
    client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
    HttpResponseMessage response = await client.GetAsync(url);
    response.EnsureSuccessStatusCode();
    var responseStream = await response.Content.ReadAsStreamAsync();
    var streamReader = new StreamReader(responseStream);
    var str = streamReader.ReadToEnd();

}

and url is

url = @"http://www.nseindia.com/live_market/dynaContent/live_watch/live_index_watch.htm";

When i did

client.DefaultRequestHeaders.Add("User-Agent",
                                 "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; 
                                  WOW64; Trident/6.0)");

in place of those four DefaultRequestHeaders, I got a 403 error, but this is nse site and is free for all. Please help friends get me correct response.. regards

Srivastava

like image 631
Ashutosh Avatar asked Jun 17 '15 18:06

Ashutosh


1 Answers

client.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");

With this you tell the server that you allow it to compress the response gzip/deflate. So the response is actually compressed which explains why you get the kind of response text you get.

If you want plain text, you shouldn’t add the header, so the server won’t compress the response. If you remove above line, you get a normal HTML response text.

Alternatively, you can of course keep that header in and decompress the response using GZipStream after receiving it. That would work like this:

using (var responseStream = await response.Content.ReadAsStreamAsync())
using (var deflateStream = new GZipStream(responseStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(deflateStream))
{
    var str = streamReader.ReadToEnd();
    Console.WriteLine(str);
}

Ideally, you should check the value of response.Content.Headers.GetValues("Content-Encoding") to make sure that the encoding is gzip. Since you also accepted deflate as a possible encoding, you could then use DeflateStream to decode that; or don’t decode anything in case the Content-Encoding header is missing.

like image 81
poke Avatar answered Oct 20 '22 19:10

poke