Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the full content from HttpWebResponse if the return content is Transfer-Encoding:chunked?

Tags:

c#

.net

web

I am writing a program to download html page from other website. I found a problem that for some particular website, I cannot get the full html code. And I only can get partial content. The server with this problem are sending data in "Transfer-Encoding:chunked" I am afraid this is the reason of the problem.

This the header information returned by server:

Transfer-Encoding: chunked
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Type: text/html; charset=UTF-8
Date: Sun, 11 Sep 2011 09:46:23 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Server: nginx/1.0.6

Here is my code:

HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response;
CookieContainer cookie = new CookieContainer();
request.CookieContainer = cookie;
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.UserAgent =
    @"Mozilla/5.0 (Windows NT 6.1; rv:6.0.2) Gecko/20100101 Firefox/6.0.2 FirePHP/0.6";
request.Accept = @"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
string html = string.Empty;
response = request.GetResponse() as HttpWebResponse;

using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
    html = reader.ReadToEnd();
}

I can only get partial html code ( I think it is the first chunk from the server). Could anyone help? Any Solution?

Thanks!

like image 619
syking Avatar asked Sep 11 '11 10:09

syking


1 Answers

You can't use ReadToEnd to read chunked data. You need to read directly from the response stream using GetBytes.

StringBuilder sb = new StringBuilder();
Byte[] buf = new byte[8192];
Stream resStream = response.GetResponseStream();

do
{
     count = resStream.Read(buf, 0, buf.Length);
     if(count != 0)
     {
          sb.Append(Encoding.UTF8.GetString(buf,0,count)); // just hardcoding UTF8 here
     }
}while (count > 0);
String html = sb.ToString();
like image 116
Strelok Avatar answered Sep 28 '22 05:09

Strelok