Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jersey multipart streaming without disk buffering at the receiving server

I'm trying to stream (large) files over HTTP into a database. I'm using Tomcat and Jersey as Webframework. I noticed, that if I POST a file to my resource the file is first buffered on disk (in temp\MIME*.tmp} before it is handled in my doPOST method.

This is really an undesired behaviour since it doubles disk I/O and also leads to a somewhat bad UX, because if the browser is already done with uploading the user needs to wait a few minutes (depending on file size of course) until he gets the HTTP response.

I know that it's probably not the best implementation of a large file upload (since you don't even have any resume capabilities) but so are the requirements. :/

So my questions is, if there is any way to disable (disk) buffering for MULTIPART POSTs. Mem buffering is obviously too expensive, but I don't really see the need for disk buffering anyway? (Explain please) How do large sites like YouTube handle this situation? Or is there at least a chance to give the user immediate feedback if the file is sent? (Should be bad, since there could be still something like SQLException)

like image 616
tobi.b Avatar asked May 16 '12 09:05

tobi.b


4 Answers

Ok, so after days of reading and trying different stuff I stumbled upon HTTPServletRequest. At first I didn't even want to try since it takes away all the convenience methods from @FormDataParam but since i didn't know what else to do...

Turns out it helped. When I'm using @Context HTTPServletRequest request and request.getInputStream() i don't get disk buffering at all.

Now I just have to figure out how to get to the FormDataContentDisposition without @FormDataParam

Edit:

Ok. MultiPartFormData probably has to buffer on disk to parse the InputStream of the Request. So it seems I have to manually parse it myself, if I want to prevent any buffering :(

like image 159
tobi.b Avatar answered Nov 10 '22 20:11

tobi.b


In case anybody is still interested, I solved the same issue by using the Apache Commons Streaming api

The code example on that page worked just fine for me.

like image 23
Emre Colak Avatar answered Nov 10 '22 21:11

Emre Colak


Your best bet is to take full control and write your own servlet that just grabs request.getInputStream (or request.getWriter if you are consuming text) and does the streaming itself. Most frameworks make your life "easy" by handling all the upload, temporary storage, etc. for you and often make it difficult to do things like streaming. It's quite easy to grab the stream yourself and do whatever you want.

like image 34
Christopher Schultz Avatar answered Nov 10 '22 21:11

Christopher Schultz


I'm pretty sure Jersey is writing the files to disk to ensure memory is not flooded. Since you know exactly what you need to do with the incoming data -> stream into the database you probably have to write your own MessageBodyReader and get Jersey to use that to process your incoming multipart data.

like image 39
Paul Jowett Avatar answered Nov 10 '22 20:11

Paul Jowett