Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overriding WebHostBufferPolicySelector for Non-Buffered File Upload

In an attempt to create a non-buffered file upload I have extended System.Web.Http.WebHost.WebHostBufferPolicySelector, overriding function UseBufferedInputStream() as described in this article: http://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/. When a file is POSTed to my controller, I can see in trace output that the overridden function UseBufferedInputStream() is definitely returning FALSE as expected. However, using diagnostic tools I can see the memory growing as the file is being uploaded.

The heavy memory usage appears to be occurring in my custom MediaTypeFormatter (something like the FileMediaFormatter here: http://lonetechie.com/). It is in this formatter that I would like to incrementally write the incoming file to disk, but I also need to parse json and do some other operations with the Content-Type:multipart/form-data upload. Therefore I'm using HttpContent method ReadAsMultiPartAsync(), which appears to be the source of the memory growth. I have placed trace output before/after the "await", and it appears that while the task is blocking the memory usage is increasing fairly rapidly.

Once I find the file content in the parts returned by ReadAsMultiPartAsync(), I am using Stream.CopyTo() in order to write the file contents to disk. This writes to disk as expected, but unfortunately the source file is already in memory by this point.

Does anyone have any thoughts about what might be going wrong? It seems that ReadAsMultiPartAsync() is buffering the whole post data; if that is true why do we require var fileStream = await fileContent.ReadAsStreamAsync() to get the file contents? Is there another way to accomplish the splitting of the parts without reading them into memory? The code in my MediaTypeFormatter looks something like this:

// save the stream so we can seek/read again later
Stream stream = await content.ReadAsStreamAsync();  

var parts = await content.ReadAsMultipartAsync(); // <- memory usage grows rapidly

if (!content.IsMimeMultipartContent())
{
    throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);               
}

//
// pull data out of parts.Contents, process json, etc.
//

// find the file data in the multipart contents
var fileContent = parts.Contents.FirstOrDefault(
x => x.Headers.ContentDisposition.DispositionType.ToLower().Trim() == "form-data" && 
x.Headers.ContentDisposition.Name.ToLower().Trim() == "\"" + DATA_CONTENT_DISPOSITION_NAME_FILE_CONTENTS + "\"");

// write the file to disk
using (var fileStream = await fileContent.ReadAsStreamAsync())
{
    using (FileStream toDisk = File.OpenWrite("myUploadedFile.bin"))
    {
        ((Stream)fileStream).CopyTo(toDisk);
    }
}
like image 969
user1543181 Avatar asked Feb 16 '13 01:02

user1543181


1 Answers

WebHostBufferPolicySelector only specifies if the underlying request is bufferless. This is what Web API will do under the hood:

IHostBufferPolicySelector policySelector = _bufferPolicySelector.Value;
bool isInputBuffered = policySelector == null ? true : policySelector.UseBufferedInputStream(httpContextBase);
    Stream inputStream = isInputBuffered
                  ? requestBase.InputStream
          : httpContextBase.ApplicationInstance.Request.GetBufferlessInputStream();

So if your implementation returns false, then the request is bufferless.

However, ReadAsMultipartAsync() loads everything into MemoryStream - because if you don't specify a provider, it defaults to MultipartMemoryStreamProvider.

To get the files to save automatically to disk as every part is processed use MultipartFormDataStreamProvider (if you deal with files and form data) or MultipartFileStreamProvider (if you deal with just files).

There is an example on asp.net or here. In these examples everything happens in controllers, but there is no reason why you wouldn't use it in i.e. a formatter.

Another option, if you really want to play with streams is to implement a custom class inheritng from MultipartStreamProvider that would fire whatever processing you want as soon as it grabs part of the stream. The usage would be similar to the aforementioned providers - you'd need to pass it to the ReadAsMultipartAsync(provider) method.

Finally - if you are feeling suicidal - since the underlying request stream is bufferless theoretically you could use something like this in your controller or formatter:

            Stream stream = HttpContext.Current.Request.GetBufferlessInputStream();
            byte[] b = new byte[32*1024];
            while ((n = stream.Read(b, 0, b.Length)) > 0)
            {
                //do stuff with stream bit
            }

But of course that's very, for the lack of better word, "ghetto."

like image 131
Filip W Avatar answered Sep 28 '22 03:09

Filip W