Proper way to implement RESTful large file upload

Tags:

I've been making REST APIs for some time now, and I'm still bugged with one case - large file upload. I've read a couple of other APIs, like Google Drive, Twitter and other literature, and I got two ideas, but I'm not sure is any of them "proper". As in proper, I mean it is somewhat standardized, there is not too much client logic needed (since other parties will be implementing that client), or even better, it could be easily called with cURL. The plan is to implement it in Java, preferably Play Framework.

Obviously I'll need some file partitioning and server-side buffering mechanism since the files are large.

So, the first solution I've got is a multipart upload (multipart/form-data). I get this way and I have implemented it like this before, but it is always strange to me to actually emulate a form on the client side, especially since the client has to set the file key name, and in my experience, that is something that clients kinda forget or do not understand. Also, how is the chunk size/part size dictated? What keeps the client from putting the whole file in one chunk?

Solution two, at least what I understood, but without finding an actual implementation implementation is that a "regular" POST request can work. The content should be chunked and data is buffered on the on the server side. However, I am not sure this is a proper understanding. How is data actually chunked, does the upload span multiple HTTP requests or is it chunked on the TCP level? What is the Content-Type?

Bottom line, what of these two (or anything else?) should be a client-friendly, widely understandable, way of implementing a REST API for file upload?

256

asked Nov 24 '15 09:11

Aleksandar Stojadinovic

2 Answers

https://tus.io/ is resumable protocol which helps in chunk uploading and resuming the upload after timeout. This is a opensource implementation and has various client and server implementations already in different languages.

answered Sep 17 '22 12:09

Jiss Raphel

I would recommend taking a look at the Amazon S3 Rest API's solution to multipart file upload. The documentation can be found here.

To summarize the procedure Amazon uses:

The client sends a request to initiate a multipart upload, the API responds with an upload id
The client uploads each file chunk with a part number (to maintain ordering of the file), the size of the part, the md5 hash of the part and the upload id; each of these requests is a separate HTTP request. The API validates the chunk by checking the md5 hash received chunk against the md5 hash the client supplied and the size of the chunk matches the size the client supplied. The API responds with a tag (unique id) for the chunk. If you deploy your API across multiple locations you will need to consider how to store the chunks and later access them in a way that is location transparent.
The client issues a request to complete the upload which contains a list of each chunk number and the associated chunk tag (unique id) received from API. The API validates there are no missing chunks and that the chunk numbers match the correct chunk tag and then assembles the file or returns an error response.

Amazon also supplies methods to abort the upload and list the chunks associated with the upload. You may also want to consider a timeout for the upload request in which the chunks are destroyed if the upload is not completed within a certain amount of time.

In terms of controlling the chunk sizes that the client uploads, you won't have much control over how the client decides to split up the upload. You could consider having a maximum chunk size configured for the upload and supply error responses for requests that contain chunks larger than the max size.

I've found the procedure works very well for handling large file uploads in REST APIs and facilitates the handling of the many edge cases associated with file upload. Unfortunately, I've yet to find a library that makes this easy to implement in any language so you pretty much have to write all of the logic yourself.

200

answered Sep 21 '22 12:09

crawfobw

Related questions
                            
                                What is your development checklist for Java low-latency application?
                            
                                How does JMS Receive work internally?
                            
                                Spring Data JPA without Spring Boot
                            
                                Viable options for running NodeJS on Android (Aug 2017)
                            
                                In Java Lambda's why is getClass() called on a captured variable
                            
                                How can you run Javascript using Rhino for Java in a sandbox?
                            
                                Java conditional operator ?: result type
                            
                                Javascript parser for Java [closed]
                            
                                Automatically incrementing a build number in a Java project
                            
                                Locally declared variables can not be inspected
                            
                                Why does Optional.map make this assignment work?
                            
                                Comparable and Comparator contract with regards to null
                            
                                Admob No fill from ad server - failed to load ad: 3
                            
                                Emacs java-mode: malabar, jdee, or eclim?
                            
                                What is antiJARLocking attribute?
                            
                                Android app crashes after SDK-tools update version (NoClassDefFound, tool version 22)
                            
                                Is inconsistency in rounding between Java 7 and Java 8 a bug?
                            
                                Java Executor Best Practices for Tasks that Should Run Forever
                            
                                Initialize member variables in the beginning of class definition or in constructor?
                            
                                What's the C++ idiom equivalent to the Java static block?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Proper way to implement RESTful large file upload

Tags:

java

rest

file

curl

file-upload

Aleksandar Stojadinovic

People also ask

2 Answers

Jiss Raphel

crawfobw

Recent Activity

Donate For Us