Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Archival filesystem or format

I'm looking for a file type for storing archives of systems that have been decomissioned. At the moment, we primarily use tar.gz, but finding and extracting just a few files from a 200GB tar.gz archive is unwieldy, since tar.gz doesn't support any sort of random-access read provision. (And before you get the idea, mounting a tgz using FUSE doen't make it better.)

Here's what we've found so far -- I'd like to know what other options there are:

  • tar.gz -- poor random-access read
  • zip -- lacks support for some advanced filesystem features (e.g: hard links, xattrs)
  • squashfs -- takes an extremely long time to create a large archive (many hours) and poor userspace tools.

I'm trying to think of a simple way of creating a full-featured filesystem image into as small a space as possible -- ext2 in a cloop image, but it doesn't seem like a particularly user-friendly solution.

Presumably this problem has been solved before -- are there any options I've missed?

like image 403
tylerl Avatar asked May 27 '11 02:05

tylerl


2 Answers

Mksquashfs is a highly parallelised program, and makes use of all available cores to maximise performance. If you're seeing very large build times then you either have a lot of duplicate files, or the machine is running short of memory and thrashing.

To investigate performance, you can firstly

Use -no-duplicates option on Mkssquashfs i,e.

mksquashfs xxx xxx.sqsh -no-duplicates

Duplicate checking is a slow operation and it has to be done sequentially, and on file sets with a lot of duplicates this becomes a bottleneck on an otherwise parallelised program.

Check memory usage/free memory while Mksquashfs is running, if the system is trashing, very low performance will occur. Investigate the -read-queue, -write-queue and -fragment-queue options to control how much data Mksquashfs caches at run-time.

Tar and zip are not parallelised and use only one core, and so it is difficult to believe your complaint about Mksquashfs compression performance.

Also I have never seen any other reports that the userspace programs are "poor", Mksquashfs and Unsquashfs have an advanced set of options which allow very fine control over the compression process, and to allow users to select which files are compressed - and these options are considerably in advance of programs like tar.

Unless you can give concrete examples of why the tools are poor, I will put this down to the usual case of the workman blaming the tools, whereas the real problem is elsewhere.

As I said previously, your system is probably thrashing and hence performing badly. By default Mksquashfs uses all available cores, and a minimum of 600 Mbytes of RAM (rising to 2 GBytes or more on large filesystems). This is for performance as caching data in memory reduces disk I/O. This "out of the box" behaviour is good for typical users which have large amounts of memory, and an otherwise idle system. This is what the majority of users want, a Mksquashfs which "maxes out" the system to achieve as fast as possible filesystem creation.

It is not good for systems with low RAM, or for systems with active processes consuming a large amount of the available CPU, and/or memory. You will simply get resource contention as each process contends for the available CPU and RAM. This is not a fault of Mksquashfs, but of the user.

The Mksquashfs -processor option is there to limit the number of processors Mksquashfs uses, the -read-queue, -write-queue and -fragment-queue options are there to control how much RAM is used by Mksquashfs.

like image 90
Phillip Lougher Avatar answered Sep 28 '22 06:09

Phillip Lougher


virt-sparsify can be used to sparsify and (through qemu's qcow2 gzip support) compress almost any linux filesystem or disk image. The resulting images can be mounted in a VM, or on the host through guestmount.

There's a new ndbkit xz plugin that can be used for higher compression, which still keeps good random-access performance (as long as you ask xz/pixz to reset compression on block boundaries).

like image 20
Gabriel Avatar answered Sep 28 '22 07:09

Gabriel