Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best compression algorithm that allows random reads/writes in a file?

What is the best compression algorithm that allows random reads/writes in a file?

I know that any adaptive compression algorithms would be out of the question.

And I know huffman encoding would be out of the question.

Does anyone have a better compression algorithm that would allow random reads/writes?

I think you could use any compression algorithm if you write it in blocks, but ideally I would not like to have to decompress a whole block at a time. But if you have suggestions on an easy way to do this and how to know the block boundaries, please let me know. If this is part of your solution, please also let me know what you do when the data you want to read is across a block boundary?

In the context of your answers please assume the file in question is 100GB, and sometimes I'll want to read the first 10 bytes, and sometimes I'll want to read the last 19 bytes, and sometimes I'll want to read 17 bytes in the middle. .

like image 647
Brian R. Bondy Avatar asked Oct 25 '08 13:10

Brian R. Bondy


People also ask

What is the best file compression algorithm?

The winner by pure compression is 7z, which isn't surprising to us. We've seen 7z come on the top of file compression benchmarks time and time again. If you want to compress something to use as little space as possible, you should definitely use 7z.

Which algorithm is used for compression?

DCT is the most widely used lossy compression method, and is used in multimedia formats for images (such as JPEG and HEIF), video (such as MPEG, AVC and HEVC) and audio (such as MP3, AAC and Vorbis).

What is lossless compression used for?

Lossless compression restores and rebuilds file data in its original form after the file is decompressed. For example, when a picture's file size is compressed, its quality remains the same. The file can be decompressed to its original quality without any loss of data.


2 Answers

The razip format supports random access reads with better performance than gzip/bzip2 which have to be tweaked for this support:

http://sourceforge.net/projects/razip/

like image 37
Erik Aronesty Avatar answered Sep 28 '22 01:09

Erik Aronesty


I am stunned at the number of responses that imply that such a thing is impossible.

Have these people never heard of "compressed file systems", which have been around since before Microsoft was sued in 1993 by Stac Electronics over compressed file system technology?

I hear that LZS and LZJB are popular algorithms for people implementing compressed file systems, which necessarily require both random-access reads and random-access writes.

Perhaps the simplest and best thing to do is to turn on file system compression for that file, and let the OS deal with the details. But if you insist on handling it manually, perhaps you can pick up some tips by reading about NTFS transparent file compression.

Also check out: "StackOverflow: Compression formats with good support for random access within archives?"

like image 121
2 revs Avatar answered Sep 28 '22 01:09

2 revs