Here is the problem I'm trying to solve:
I have about 100 binary files (in total 158KB and they are roughly the same size +/- 50% of each other). I need to selectively parse only a few of these files (in the worst case maybe 50, in other cases as little as 1 to 5). This is on an Android device, by the way.
What is the fastest way to do this in Java?
One way could be combining everything into one file and then using file seek to get to the each individual file. That way file open would only need to be called once and that is usually slow. However, in order to know where each file is there would need to be some sort of table in the beginning of the file -- which could be generated using a script -- but the files would also need to be indexed in the table in the order that they were concatenated so file seek wouldn't have to do much work (correct me if I'm wrong).
A better way would be to make the file memory-mapped and then the table wouldn't have to be in sorted order of concatenation because the memory-mapped file would have random access (again correct me if I'm wrong).
Creating that table would be an unnecessary if zip compression was used because zip compression already makes a table. In addition, all the files wouldn't have to be concatenated. I could zip the directory and then access each of the individual files by their entries in the zip file. Problem solved.
Except if the zip file isn't memory-mapped, it will be slower to read, since system calls are slower than direct memory access (correct me if I'm wrong). So I came to the conclusion that the best solution would be to use a memory-mapped zip archive.
However, the ZipFile
entries return an InputStream
to read the contents of the entry. And the MappedByteBuffer
needs a RandomAccessFile
which takes a filename as input, not an InputStream
.
Is there anyway to memory-map a zip file for fast reads? Or is there a different solution to this problem of reading a selection of files?
Thanks
EDIT: I tested speeds of open, close, and parsing of the files here are the statistics that I found:
Number of Files: 25 (24 for parse because garbage collection interrupted timing)
Total Open Time: 72ms
Total Close Time: 1ms
Total Parse Time: 515ms
(this is skewed in Parse's favor because Parse is missing a file)%Total time Open takes: 12%
%Total time Close takes: 0.17%
%Total time Parse takes: 88%
Avg time Open takes per file: 2.88ms
Avg time Close takes per file: 0.04ms
Avg time Parse takes per file: 21.46ms
Reading and writing in the memory-mapped file is generally done by the operating system to write content into a disk. Prefer Direct buffer to Indirect Buffer for better performance. Memory used to load File is outside Java heap and reside on shared memory which allows us to two different ways to access the file.
You can create a stream based on a byte buffer which resides in memory, by using a ByteArrayInputStream and a ByteArrayOutputStream to read from and write to a byte buffer in a similar way you read and write from a file. The byte array contains the "File's" content. You do not need a File object then.
A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.
Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of addresses within an application's address space. The application can then access files on disk in the same way it accesses dynamic memory.
I would use a simple api like RandomAccessFile for now and revisit the issue if you really need to.
Edit - I didn't know about MappedByteBuffer
. That seems like the way to go. Why not do this with separate files first and then later think about combining them later?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With