I'm trying to stream (large) files over HTTP into a database. I'm using Tomcat and Jersey as Webframework. I noticed, that if I POST a file to my resource the file is first buffered on disk (in temp\MIME*.tmp} before it is handled in my doPOST method.
This is really an undesired behaviour since it doubles disk I/O and also leads to a somewhat bad UX, because if the browser is already done with uploading the user needs to wait a few minutes (depending on file size of course) until he gets the HTTP response.
I know that it's probably not the best implementation of a large file upload (since you don't even have any resume capabilities) but so are the requirements. :/
So my questions is, if there is any way to disable (disk) buffering for MULTIPART POSTs. Mem buffering is obviously too expensive, but I don't really see the need for disk buffering anyway? (Explain please) How do large sites like YouTube handle this situation? Or is there at least a chance to give the user immediate feedback if the file is sent? (Should be bad, since there could be still something like SQLException)
Ok, so after days of reading and trying different stuff I stumbled upon HTTPServletRequest. At first I didn't even want to try since it takes away all the convenience methods from @FormDataParam but since i didn't know what else to do...
Turns out it helped. When I'm using @Context HTTPServletRequest request
and request.getInputStream()
i don't get disk buffering at all.
Now I just have to figure out how to get to the FormDataContentDisposition without @FormDataParam
Edit:
Ok. MultiPartFormData probably has to buffer on disk to parse the InputStream of the Request. So it seems I have to manually parse it myself, if I want to prevent any buffering :(
In case anybody is still interested, I solved the same issue by using the Apache Commons Streaming api
The code example on that page worked just fine for me.
Your best bet is to take full control and write your own servlet that just grabs request.getInputStream (or request.getWriter if you are consuming text) and does the streaming itself. Most frameworks make your life "easy" by handling all the upload, temporary storage, etc. for you and often make it difficult to do things like streaming. It's quite easy to grab the stream yourself and do whatever you want.
I'm pretty sure Jersey is writing the files to disk to ensure memory is not flooded. Since you know exactly what you need to do with the incoming data -> stream into the database you probably have to write your own MessageBodyReader and get Jersey to use that to process your incoming multipart data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With