Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading an input stream twice without storing it in memory

With reference to the stackoverflow question it is said that the InputStream can be read multiple times with mark() and reset() provided by the InputStream or by using PushbackInputStream.

In all these cases the content of the stream is stored in byte array (ie; the original content of the file is stored in main memory) and reused multiple times.

What happens when the size of the file exceeds the memory size? I think this may pave way for OutOfMemoryException.

Is there any better way to read the stream content multiple times without storing the stream content locally (ie; in main memory)?

Please help me knowing this. Thanks in advance.

like image 688
Tom Taylor Avatar asked Jul 13 '16 17:07

Tom Taylor


1 Answers

It depends on the source of the stream.

If it's a local file, you can likely re-open and re-read the stream as many times as you want.

If it's dynamically generated by a process, a remote service, etc., you might not be free to re-generate it. In that case, you need to store it, either in memory or in some more persistent (and slow) storage like a file system or storage service.


Maybe an analogy would help. Suppose your friend is speaking to you at length. You listen carefully without interruption, but when they are done, you realize you didn't understand something they said near the beginning, and want to review that portion.

At this point, there are a few possibilities.

Perhaps your friend was actually reading aloud from a book. You can simply re-read the book.

Or, perhaps you had to foresight to record their monologue. You can replay the recording.

However, since neither you nor your friend has perfect and unlimited recall, simply repeating verbatim what was said ten minutes ago from memory alone is not an option.

An InputStream is like your friend speaking. Neither of you has a good enough memory to remember exactly, word-for-word, what is said. In the same way, neither a process that is generating the data stream nor your program has enough RAM to store, byte-for-byte, the stream. To scale, your program has to rely on its "short-term memory" (RAM), working with just a small portion of the whole stream at any given time, and "taking notes" (writing to a persistent store) as it encounters important points.

If the source of stream is a local file, then it's like your friend reading a book. Either of you can re-read that content easily enough.

If you copy the stream to some persistent storage, that's like recording your friend's speech. You can replay it as often as you like.


Consider a scenario where browser is uploading a large file, but the server is busy, and not able to read that stream for some time. Where is that data stored during that delay?

Because the receiver can't always respond immediately to input, TCP and many other protocols allocate a small buffer to store some data from a sender. But, they also have a way to tell the sender to wait, they are sending data too fast—flow control. Going back to the analogy, it's like telling your friend to pause a moment while you catch up with your note-taking.

As the browser uploads the file, at first, the buffer will be filled. But if the server can't keep up, the browser will be instructed to pause its upload until there is more room in the buffer. (This generally happens at the OS and TCP level; the client and server applications don't manage this directly.) The upload speed depends on how fast the browser can read the file from disk, how fast the network link is, and how fast the server can process the uploaded data. Even a fast network and client will be limited by the weak link in this chain.

like image 111
erickson Avatar answered Oct 12 '22 23:10

erickson