Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An appendable compressed archive

I have a requirement to maintain a compressed archive of log files. The log filenames are unique and the archive, once expanded, is simply one directory containing all the log files.

The current solution isn't scaling well, since it involves a gzipped tar file. Every time a log file is added, they first decompress the entire archive, add the file, and re-gzip.

Is there a Unix archive tool that can add to a compressed archive without completely expanding and re-compressing? Or can gzip perform this, given the right combination of arguments?

like image 785
Chap Avatar asked Jun 27 '13 01:06

Chap


People also ask

What is compressed archive?

Description. The Compress-Archive cmdlet creates a compressed, or zipped, archive file from one or more specified files or directories. An archive packages multiple files, with optional compression, into a single zipped file for easier distribution and storage.

Which of the following is the format of a compressed archive?

Common compressed file extensions are . ZIP, . RAR, . ARJ, .

What are the different types of compressed files?

Essentially, there are two main types of file compression ? lossless and lossy. Lossless compression takes your files and reduces their size without losing any information. Lossy compression reduces your file size by chopping off bits and pieces that aren't 100% necessary to function.

Can you append to zip file?

The append mode ( "a" ) allows you to append new member files to an existing ZIP file. This mode doesn't truncate the archive, so its original content is safe. If the target ZIP file doesn't exist, then the "a" mode creates a new one for you and then appends any input files that you pass as an argument to .


1 Answers

I'm using zip -Zb for that (appending text logs incrementally to compressed archive):

  • fast append (index is at the end of archive, efficient to update)
  • -Zb uses bzip2 compression method instead of deflate. In 2018 this seems safe to use (you'll need a reasonably modern unzip -- note some tools do assume deflate when they see a zip file, so YMMV)
  • 7z was a good candidate: compression ratio is vastly better than zip when you compress all files in the same operation. But when you append files one by one to the archive (incremental appending), compression ratio is only marginally better than standard zip, and similar to zip -Zb. So for now I'm sticking with zip -Zb.

To clarify what happens and why having the index at the end is useful for "appendable" archive format, with entries compressed individually:

Before:
############## ########### ################# #
[foo1.png    ] [foo2.png ] [foo3.png       ] ^
                                             |
                                         index

After:
############## ########### ################# ########### #
[foo1.png    ] [foo2.png ] [foo3.png       ] [foo4.png ] ^
                                                         |
                                                 new index

So this is not fopen in append mode, but presumably fopen in write mode, then fseek, then write (that's my mental model of it, someone let me know if this is wrong). I'm not 100% certain that it would be so simple in reality, it might depend on OS and file system (e.g. a file system with snapshots might have a very different opinion about how to deal with small writes at the end of a file… huge "YMMV" here 🤷🏻‍♂️)

like image 115
Hugues M. Avatar answered Oct 07 '22 21:10

Hugues M.