Currently I have the below code for reading an InputStream
. I am storing the whole file into a StringBuilder
variable and processing this string afterwards.
public static String getContentFromInputStream(InputStream inputStream)
// public static String getContentFromInputStream(InputStream inputStream,
// int maxLineSize, int maxFileSize)
{
StringBuilder stringBuilder = new StringBuilder();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String lineSeparator = System.getProperty("line.separator");
String fileLine;
boolean firstLine = true;
try {
// Expect some function which checks for line size limit.
// eg: reading character by character to an char array and checking for
// linesize in a loop until line feed is encountered.
// if max line size limit is passed then throw an exception
// if a line feed is encountered append the char array to a StringBuilder
// after appending check the size of the StringBuilder
// if file size exceeds the max file limit then throw an exception
fileLine = bufferedReader.readLine();
while (fileLine != null) {
if (!firstLine) stringBuilder.append(lineSeparator);
stringBuilder.append(fileLine);
fileLine = bufferedReader.readLine();
firstLine = false;
}
} catch (IOException e) {
//TODO : throw or handle the exception
}
//TODO : close the stream
return stringBuilder.toString();
}
The code went for a review with the Security team and the following comments were received:
BufferedReader.readLine
is susceptible to DOS (Denial of Service) attacks (line of infinite length, huge file containing no line feed/carriage return)
Resource exhaustion for the StringBuilder
variable (cases when a file containing data greater than the available memory)
Below are the solutions I could think of:
Create an alternate implementation of readLine
method (readLine(int limit)
), which checks for the no. of bytes read and if it exceeds the specified limit, throw a custom exception.
Process the file line by line without loading the file in entirety. (pure non-Java solution :) )
Please suggest if there are any existing libraries which implement the above solutions. Also suggest any alternate solutions which offer more robustness or are more convenient to implement than the proposed ones. Though performance is also a major requirement, security comes first.
You want to avoid all sorts of DOS attacks (on lines, on size of the file, etc). But in the end of the function, you're trying to convert the entire file into one single String
!!! Assume that you limit the line to 8 KB, but what happens if somebody sends you a file with two 8 KB lines? The line reading part will pass, but when finally you combine everything into a single string, the String will choke all available memory.
So since finally you're converting everything into one single String, limiting line size doesn't matter, nor is safe. You have to limit the entire size of the file.
Secondly, what you're basically trying to do is, you're trying to read data in chunks. So you're using BufferedReader
and reading it line-by-line. But what you're trying to do, and what you really want at the end - is some way of reading the file piece by piece. Instead of reading one line at a time, why not instead read 2 KB at a time?
BufferedReader
- by its name - has a buffer inside it. You can configure that buffer. Let's say you create a BufferedReader
with buffer size of 2 KB:
BufferedReader reader = new BufferedReader(..., 2048);
Now if the InputStream
that you pass to BufferedReader
has 100 KB of data, BufferedReader
will automatically read it 2 KB at at time. So it will read the stream 50 times, 2 KB each (50x2KB = 100 KB). Similarly, if you create BufferedReader
with a 10 KB buffer size, it will read the input 10 times (10x10KB = 100 KB).
BufferedReader
already does the work of reading your file chunk-by-chunk. So you don't want to add an extra layer of line-by-line above it. Just focus on the end result - if your file at the end is too big (> available RAM) - how are you going to convert it into a String
at the end?
One better way is to just pass things around as a CharSequence
. That's what Android does. Throughout the Android APIs, you will see that they return CharSequence
everywhere. Since StringBuilder
is also a subclass of CharSequence
, Android will internally use either a String
, or a StringBuilder
or some other optimized string class based on the size/nature of input. So you could rather directly return the StringBuilder
object itself once you've read everything, rather than converting it to a String
. This would be safer against large data. StringBuilder
also maintains the same concept of buffers inside it, and it will internally allocate multiple buffers for large strings, rather than one long string.
So overall:
Using Apache Commons IO, here is how you would read data from a BoundedInputStream
into a StringBuilder
, splitting by 2 KB blocks instead of lines:
// import org.apache.commons.io.output.StringBuilderWriter;
// import org.apache.commons.io.input.BoundedInputStream;
// import org.apache.commons.io.IOUtils;
BoundedInputStream boundedInput = new BoundedInputStream(originalInput, <max-file-size>);
BufferedReader reader = new BufferedReader(new InputStreamReader(boundedInput), 2048);
StringBuilder output = new StringBuilder();
StringBuilderWriter writer = new StringBuilderWriter(output);
IOUtils.copy(reader, writer); // copies data from "reader" => "writer"
return output;
Use BoundedInputStream from Apache Commons IO library. Your work becomes much more easier.
The following code will do what you want:
public static String getContentFromInputStream(InputStream inputStream) {
inputStream = new BoundedInputStream(inputStream, <number-of-bytes>);
// Rest code are all same
You just simply wrap your InputStream
with a BoundedInputStream
and you specify a maximum size. BoundedInputStream
will take care of limiting reads up to that maximum size.
Or you can do this when you're creating the reader:
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(
new BoundedInputStream(inputStream, <no-of-bytes>)
)
);
Basically what we're doing here is, we're limiting the read size at the InputStream
layer itself, rather than doing that when reading lines. So you end up with a reusable component like BoundedInputStream
which limits reading at the InputStream layer, and you can use that wherever you want.
Edit: Added footnote
Edit 2: Added updated answer based on comments
There are basically 4 ways to do file processing:
Stream-Based Processing (the java.io.InputStream
model): Optionally put a bufferedReader around the stream, iterate & read the next available text from the stream (if no text is available, block until some becomes available), process each piece of text independently as it's read (catering for widely-varying sizes of text pieces)
Chunk-Based Non-Blocking Processing (the java.nio.channels.Channel
model): Create a set of fixed-sized buffers (representing the "chunks" to be processed), read into each of the buffers in turn without blocking (nio API delegates to native IO, using fast O/S-level threads), your main processing thread picks each buffer in turn once it is filled and processes the fixed-size chunk, as other buffers continue to be asynchronously loaded.
Part File Processing (including line-by-line processing) (can leverage (1) or (2) to isolate or build up each "part"): break your file format down into semantically meaningful sub-parts (if possible! breaking into lines could be possible!), iterate through stream pieces or chunks and build-up content in memory until the next part is completely built, process each part as soon as it's built.
Entire File Processing (the java.nio.file.Files
model): Read the entire file into memory in one operation, process the complete contents
Which one should you use?
It depends - on your file contents and the type of processing you require.
From a resource-use efficiency perspective (best to worst) is: 1,2,3,4.
From a processing speed & efficiency perspective (best to worst) is: 2,1,3,4.
From an ease of programming perspective (best to worst): 4,3,1,2.
However, some types of processing might require more than the smallest piece of text (ruling out 1, and maybe 2) and some file formats may not have internal parts (ruling out 3).
You're doing 4. I suggest you shift to 3 (or lower), if you can.
Under 4, there's only one way to avoid DOS - limit the size before it's read into memory, (or for that matter copied to your file system). It's too late once it's read in. If this is not possible, then try 3, 2 or 1.
Limiting File Size
Often the file is uploaded via a HTML form.
If uploading using Servlet @MultipartConfig
annotation and request.getPart().getInputStream()
, you have control over how much data you read from the stream. Also, request.getPart().getSize()
returns the file size in advance and if it's small enough, you can do request.getPart().write(path)
to write the file to disk.
If uploading using JSF, then JSF 2.2 (very new) has the standard html component <h:inputFile>
(javax.faces.component.html.InputFile
), which has an attribute for maxLength
; pre-JSF 2.2 implementations have similar custom components (e.g. Tomahawk has <t:InputFileUpload>
with maxLength
attribute; PrimeFaces has <p:FileUpload>
with sizeLimit
attribute).
Alternatives to Read Entire File
Your code which uses InputStream
, StringBuilder
, etc, is an efficient way to read the entire file, but is not necessarily the simplest way (least lines of code).
Junior/average developers could get the misapprehension that you're doing efficient stream-based processing, when you're processing the entire file - so include appropriate comments.
If you want less code, you could try one of the following:
List<String> stringList = java.nio.file.Files.readAllLines(path, charset);
or
byte[] byteContents = java.nio.file.Files.readAllBytes(path);
But they require care, or they could be inefficient in resource usage. If you use readAllLines
and then concatenate the List
elements into a single String
, then you would consume double the memory (for the List
elements + the concatenated String
). Similarly, if you use readAllBytes
, followed by encoding to String
(new String(byteContents, charset)
), then again, you're using "double" the memory. So best to process directly against List<String>
or byte[]
, unless you limit your files to a small enough size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With