Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to memory map a compressed file?

We have large files with zlib-compressed binary data that we would like to memory map.

Is it even possible to memory map such a compressed binary file and access those bytes in an effective manner?

Are we better off just decompressing the data, memory mapping it, then after we're done with our operations compress it again?

EDIT

I think I should probably mention that these files can be appended to at regular intervals.

Currently, this data on disk gets loaded via NSMutableData and decompressed. We then have some arbitrary read/write operations on this data. Finally, at some point we compress and write the data back to disk.

like image 882
Tim Reddy Avatar asked Oct 23 '22 23:10

Tim Reddy


1 Answers

Memory mapping is all about the 1:1 mapping of memory to disk. That's not compatible with automatic decompression, since it breaks the 1:1 mapping.

I assume these files are read-only, since random-access writing to a compressed file is generally impractical. I would therefore assume that the files are somewhat static.

I believe this is a solvable problem, but it's not trivial, and you will need to understand the compression format. I don't know of any easily reusable software to solve it (though I'm sure many people have solved something like it in the past).

You could memory map the file and then provide a front-end adapter interface to fetch bytes at a given offset and length. You would scan the file once, decompressing as you went, and create a "table of contents" file that mapped periodic nominal offsets to real offset (this is just an optimization, you could "discover" this table of contents as you fetched data). Then the algorithm would look something like:

  • Given nominal offset n, look up greatest real offset m that maps to less than n.
  • Read m-32k into buffer (32k is the largest allowed distance in DEFLATE).
  • Begin DEFLATE algorithm at m. Count decompressed bytes until you get to n.

Obviously you'd want to cache your solutions. NSCache and NSPurgeableData are ideal for this. Doing this really well and maintaining good performance would be challenging, but if it's a key part of your application it could be very valuable.

like image 70
Rob Napier Avatar answered Nov 15 '22 10:11

Rob Napier