Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve partial response with System.Net.HttpClient

I'm trying to use the new HttpClient class (in .NET 4.5) to retrieve partial responses from the server in order to check the content. I need to limit the size of data retrieved to the first few bytes of content in the HTTP requests to limit the bandwidth usage.

I've been unable to accomplish this. I have tried using GetAsync(url, HttpCompletionOption.ResponseHeadersRead) then use Content.ReadAsStream() in an attempt to only read the headers and then read the response stream in a small chunk. I also tried GetStreamAsync() and then reading the Content stream with a small chunk (1000 bytes).

In both cases it appears that HttpClient is pulling and buffering the entire HTTP response rather than just reading the requested byte count from the stream.

Initially I was using Fiddler to monitor the data, but realized that Fiddler might actually be causing the entire content to be proxied. I switched to using System.Net tracing (which shows):

ConnectStream#6044116::ConnectStream(Buffered 16712 bytes.)

which is the full size rather than just the 1000 bytes read. I've also double checked in Wireshark to verify that indeed the the full content is being pulled over the wire and it is. With larger content (like a 110k link) I get about 20k of data before the TCP/IP stream is truncated.

The two ways I've tried to read the data:

response = await client.GetAsync(site.Url, HttpCompletionOption.ResponseHeadersRead);
var stream = await response.Content.ReadAsStreamAsync();

var buffer = new byte[1000];                                        
var count = await stream.ReadAsync(buffer, 0, buffer.Length);
response.Close()  // close ASAP
result.LastResponse = Encoding.UTF8.GetString(buffer);

and:

var stream = await client.GetStreamAsync(site.Url);
var buffer = new byte[1000];
var count = await stream.ReadAsync(buffer, 0, buffer.Length);
result.LastResponse = Encoding.UTF8.GetString(buffer);

Both of them produce nearly identical .NET trace's which include the buffered read.

Is it possible to have HttpClient actually read only a small chunk of an Http Repsonse, rather than the entire response in order to not use the full bandwidth? IOW is there a way to disable any buffering on the HTTP connection using either HttpClient or HttpWebRequest?

Update: After some more extensive testing it looks like both HttpClient and HttpWebRequest buffer the first few TCP/IP frames - presumably to ensure the HTTP header is captured. So if you return a small enough request, it tends to get loaded completely just because it's in that inital bufferred read. But when loading a larger content url, the content does get truncated. For HttpClient it's around 20k, for HttpWebRequest somewhere around 8k for me.

Using TcpClient doesn't have any buffering issues. When using it I get content read at the size of the read plus a bit extra for the nearest buffer size overlap, but that does include the HTTP header. Using TcpClient is not really an option for me as we have to deal with SSL, Redirects, Auth, Chunked content etc. At that point I'd be looking at implementing a full custom HTTP client just to turn of buffering.

like image 581
Rick Strahl Avatar asked Jan 09 '14 10:01

Rick Strahl


2 Answers

The best way to achive what you need to do is something like the following:

using System;
using System.Net.Sockets;

namespace tcpclienttest
{
  class Program
  {
    static byte[] GetData(string server, string pageName, int byteCount, out int     actualByteCountRecieved)
    {
      const int port = 80;
      TcpClient client = new TcpClient(server, port);

      string fullRequest = "GET " + pageName + " HTTP/1.1\nHost: " + server + "\n\n";
      byte[] outputData = System.Text.Encoding.ASCII.GetBytes(fullRequest);

      NetworkStream stream = client.GetStream();
      stream.Write(outputData, 0, outputData.Length);

      byte[] inputData = new Byte[byteCount];

      actualByteCountRecieved = stream.Read(inputData, 0, byteCount);

      // If you want the data as a string, set the function return type to a string
      // return 'responseData' rather than 'inputData'
      // and uncomment the next 2 lines
      //string responseData = String.Empty;
      //responseData = System.Text.Encoding.ASCII.GetString(inputData, 0, actualByteCountRecieved);

      stream.Close();
      client.Close();

      return inputData;
    }

    static void Main(string[] args)
    {
      int actualCount;
      const int requestedCount = 1024;
      const string server = "myserver.mydomain.com"; // NOTE: NO Http:// or https:// bit, just domain or IP
      const string page = "/folder/page.ext";

      byte[] myPartialPage = GetData(server, page, requestedCount, out actualCount);
    }
  }
}

Couple of points to note however:

There's NO error handling in there, so you might want to wrap it all in a try/catch or something to make sure you get hold of any connection errors, timeouts, unsolved IP resolution etc.

Beacuse your dealing with the raw stream, then the HTTP headers are also in there, so you'll need to take them into account.

You could in theory, put a loop in just before the main socket read, in keep grabbing data until you get a blank \n on it's own in a line, that will tell you where the headers end, then you could grab your actual count of data, but since I don't know the server your talking too I left that bit out :-)

If you copy/Paste the entire code into a new console project in VS it's runnable as it is, so you can single step it.

As far as I know the HTTP client doesn't make it's raw stream available to the user, and even then if it did because it's allocated as a streaming connection it's not likely you would have much control over it's count, I've looked into it before and given up.

I've used this code a number of times and it works well for me in similar cases, in fact I have a monitor that sits and gets stats from my WiFi adapter using it so I can see who's connecting.

Any questions, feel free to hit me up on here, or ping me on twitter my handle is @shawty_ds (just in case you lost it)

Shawty

like image 107
shawty Avatar answered Oct 08 '22 20:10

shawty


I may be wrong at this but I think you're getting confused : when you send the request to the server, it will send you the complete answer through the network. Then it is buffered somewhere by the framework and you access it using the stream. If you don't want the remote server to send you the full answer, you may be able to specify the range of bytes you want using http headers. See HTTP Status: 206 Partial Content and Range Requests for example.

like image 36
Marshall777 Avatar answered Oct 08 '22 19:10

Marshall777