This is the first time I am building a web app for the sole purpose of processing user uploaded files and I have a few questions in regards to how this is normally done:
Are there any security issues that I have to take into account? The files to be processed are in essence text files that my app will read line by line. Should I limit the file upload extension and/or is there any other precautions I should take into account?
What is the best organization method for uploaded files? These files do not need to be stored permanently in my app so should I just dump them in a general "Data" folder and delete whatever is no longer needed?
Are there any other important aspects to building web apps with similar functionalities that I've missed?
Thanks
The two controls used to upload files to a web server are: HtmlInputFile - an HTML server control. File Upload - an ASP.NET server control.
The SaveAs method saves the contents of an uploaded file to a specified path on the Web server.
HTML allows you to add the file upload functionality to your website by adding a file upload button to your webpage with the help of the <input> tag. The <input type=”file”> defines a file-select field and a “Browse“ button for file uploads.
What is IFormFile. ASP.NET Core has introduced an IFormFile interface that represents transmitted files in an HTTP request. The interface gives us access to metadata like ContentDisposition, ContentType, Length, FileName, and more. IFormFile also provides some methods used to store files.
The only security issue you have to watch for is inserting the raw text (without data scrubbing to prevent SQL injections) into the database. If there is no database involved, you should be fine. As for extensions, limiting extensions is really a poor top-level filter. It's good to have, but it's only peering skin deep into what the file really contains. A file size limit would help also.
Saving to the disk can be costly with a large amount of transactions, but on the other hand, it will clutter your server memory less as more requests/more threads are being used. You can also work with the files in-memory, but for large files, it may end up being detrimental. Consider what you're working with and choose the best approach.
Define a timeout so that large uploaded files won't be occupying unnecessary server processes when in the end it's too large anyway.
I am assuming that you're working with ASP.NET's FileUpload
control. Bear in mind that the file does not persist through postbacks (to prevent a security loophole), so the user has to keep browsing to the file each time the page is requested. This is a nuisance if you have server-side validators.
Edited to answer comment:
By working in-memory, I am talking about manipulating the file uploaded purely through code without resorting to saving it physically on the server's disk.
For instance, if you're using a FileUpload
control, then the user's file can be accessed through a Stream object FileUpload.FileContent
or as a byte array FileUpload.FileBytes
(API Reference). Since that's a Stream
you can just read the file on the fly without having to save it first.
Markup:
<asp:FileUpload ID="fileUploadControl" ToolTip="Upload a file" runat="server" />
Codebehind:
If fileUploadControl.HasFile AndAlso _
(fileUploadControl.FileName.ToLower().EndsWith(".txt") OrElse _
fileUploadControl.ToLower().FileName.EndsWith(".dat")) Then
SaveThisToDataBase(fileUploadControl.FileName, fileUploadControl.FileBytes)
End If
See? No need to save to the disk at all. fileUploadControl.FileBytes
contains a bytearray of the data uploaded.
If you wanted to save to a file, then you can just use the stream to write to the disk.
I don't know how 'standard' my answer is but here's what I did when I had a similar setup:
I limited the file extensions to a handful of file types, just to make it harder to upload bad files. It's easy to circumvent but at least it's one more step a malicious user would have to take.
I had to add write permissions to the IUSR account under IIS to the folder where I stored the files. This folder was a subfolder of my application's root folder.
I had to deal with a lot of files so I created a new subfolder for each month, like Uploaded\012012
, Uploaded\022012
, etc. This made file access faster since I only had a few hundred files in each folder. I stored each upload in the database and had a scheduled task to clean up the file system regularly. This also deleted old empty folders.
As I said, I don't know if this is standard (or even if it's a really good practice), but it worked well for the environment where I used it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With