How does BufferedReader read files from S3?

Tags:

I have a very large file (several GB) in AWS S3, and I only need a small number of lines in the file which satisfy a certain condition. I don't want to load the entire file in-memory and then search for and print those few lines - the memory load for this would be too high. The right way would be to only load those lines in-memory which are needed.

As per AWS documentation to read from file:

fullObject = s3Client.getObject(new GetObjectRequest(bucketName, key));
 displayTextInputStream(fullObject.getObjectContent());

private static void displayTextInputStream(InputStream input) throws IOException {
    // Read the text input stream one line at a time and display each line.
    BufferedReader reader = new BufferedReader(new InputStreamReader(input));
    String line = null;
    while ((line = reader.readLine()) != null) {
        System.out.println(line);
    }
    System.out.println();
}

Here we are using a BufferedReader. It is not clear to me what is happening underneath here.

Are we making a network call to S3 each time we are reading a new line, and only keeping the current line in the buffer? Or is the entire file loaded in-memory and then read line-by-line by BufferedReader? Or is it somewhere in between?

437

asked Jul 24 '18 19:07

sbhatla

2 Answers

One of the answer of your question is already given in the documentation you linked:

Your network connection remains open until you read all of the data or close the input stream.

A BufferedReader doesn't know where the data it reads is coming from, because you're passing another Reader to it. A BufferedReader creates a buffer of a certain size (e.g. 4096 characters) and fills this buffer by reading from the underlying Reader before starting to handing out data of calls of read() or read(char[] buf).

The Reader you pass to the BufferedReader is - by the way - using another buffer for itself to do the conversion from a byte-based stream to a char-based reader. It works the same way as with BufferedReader, so the internal buffer is filled by reading from the passed InputStream which is the InputStream returned by your S3-client.

What exactly happens within this client if you attempt to load data from the stream is implementation dependent. One way would be to keep open one network connection and you can read from it as you wish or the network connection can be closed after a chunk of data has been read and a new one is opened when you try to get the next one.

The documentation quoted above seems to say that we've got the former situation here, so: No, calls of readLine are not leading to single network calls.

And to answer your other question: No, a BufferedReader, the InputStreamReader and most likely the InputStream returned by the S3-client are not loading in the whole document into memory. That would contradict the whole purpose of using streams in the first place and the S3 client could simply return a byte[][] instead (to come around the limit of 2^32 bytes per byte-array)

Edit: There is an exception of the last paragraph. If the whole gigabytes big document has no line breaks, calling readLine will actually lead to the reading of the whole data into memory (and most likely to a OutOfMemoryError). I assumed a "regular" text document while answering your question.

178

answered Sep 19 '22 03:09

Lothar

If you are basically not searching for a specific word/words, and you are aware of the bytes range, you can also use Range header in S3. This should be specifically useful as you are working with a single file of several GB size. Specifying Range not only helps to reduce the memory, but also is faster, as only the specified part of the file is read.

See Is there "S3 range read function" that allows to read assigned byte range from AWS-S3 file?

Hope this helps.

Sreram

answered Sep 21 '22 03:09

Sreram Balasubramaniyan

Related questions
                            
                                Case insensitive search with Predicate and Criteria API
                            
                                How to iterate Flux and mix with Mono
                            
                                Process elements of Set<Foo> and create Set<Bar> using streams
                            
                                Problems with dialect SQLite 3 with Hibernate 5
                            
                                FindBugs raises a bug called EI_EXPOSE_REP caused by Array
                            
                                AssertJ: Assert if list is sorted inversely or in descending order
                            
                                Access all elements after stream filter
                            
                                Dagger Component has conflicting scopes
                            
                                Spring Cloud Gateway 2.0 forward path variable
                            
                                Remove multiple items from ArrayList during iterator
                            
                                Using Java 8 Streams, how to find the max for a given element in a HashMap
                            
                                how to get string from string.xml in adapter class on click method
                            
                                java date parse missbehaviour [duplicate]
                            
                                Getting an object from its address
                            
                                Java stream sort by 2 fields
                            
                                How to set SameSite attribute?
                            
                                Registering a custom ValueParamProvider in Jersey 2.27
                            
                                cannot resolve symbol 'FloatingActionButton'
                            
                                Client Certificate Authentication with Spring Boot
                            
                                How to wait for an asynchronous method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does BufferedReader read files from S3?

Tags:

java

inputstream

amazon-s3

bufferedreader

bufferedinputstream

sbhatla

People also ask

2 Answers

Lothar

Sreram Balasubramaniyan

Recent Activity

Donate For Us