Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a file vs loading a file into main memory from disk for processing

Tags:

java

file

how do I load a file into main memory?

I read the files using, I use

BufferReader buf = new BufferedReader(FileReader());

I presume that this is reading the file line by line from the disk. What is the advantage of this?

What is the advantage of loading the file directly into memory? How do we do that in Java?

I found some examples on Scanner or RandomAccessFile methods. Do they load the files into memory? Should I use them? Which of the two should I use ?

Thanks in advance!!!

like image 519
Mahalakshmi Lakshminarayanan Avatar asked Oct 27 '12 01:10

Mahalakshmi Lakshminarayanan


2 Answers

BufferReader buf = new BufferedReader(FileReader());

I presume that this is reading the file line by line from the disk. What is the advantage of this?

Not exactly. It is reading the file in chunks whose size is the default buffer size (8k bytes I think).

The advantage is that you don't need a huge heap to read a huge file. This is a significant issue since the maximum heap size can only be specified at JVM startup (with Hotspot Java).

You also don't consume the system's physical / virtual memory resources to represent the huge heap.

What is the advantage of loading the file directly into memory?

It reduces the number of system calls, and may read the file faster. How much faster depends on a number of factors. And you have the problem of dealing with really large files.

How do we do that in Java?

  1. Find out how large the file is.
  2. Allocate a byte (or character) array big enough.
  3. Use the relevant read(byte[], int, int) or read(char[], int, int) method to read the entire file.

You can also use a memory-mapped file ... but that requires using the Buffer APIs which can be a bit tricky to use.

I found some examples on Scanner or RandomAccessFile methods. Do they load the files into memory?

No, and no.

Should I use them? Which of the two should I use ?

Do they provide the functionality that you require? Do you need to read / parse text-based data? Do you need to do random access on a binary data?

Under normal circumstances, you should chose your I/O APIs based primarily on the functionality that you require, and secondarily on performance considerations. Using a BufferedInputStream or BufferedReader is usually enough to get acceptable* performance if you intend to parse it as you read it. (But if you actually need to hold the entire file in memory in its original form, then a BufferedXxx wrapper class actually makes reading a bit slower.)


* - Note that acceptable performance is not the same as optimal performance, but your client / project manager probably would not want your to waste time writing code to perform optimally ... if this is not a stated requirement.

like image 200
Stephen C Avatar answered Oct 04 '22 02:10

Stephen C


If you're reading in the file and then parsing it, walking from beginning to end once to extract your data, then not referencing the file again, a buffered reader is about as "optimal" as you'll get. You can "tune" the performance somewhat by adjusting the buffer size -- a larger buffer will read larger chunks from the file. (Make the buffer a power of 2 -- eg 262144.) Reading in an entire large file (larger than, say, 1mb) will generally cost you performance in paging and heap management.

like image 30
Hot Licks Avatar answered Oct 04 '22 02:10

Hot Licks