Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decompress tar files using C#

Tags:

c#

.net

tar

I'm searching a way to add embedded resource to my solution. This resources will be folders with a lot of files in them. On user demand they need to be decompressed.

I'm searching for a way do store such folders in executable without involving third-party libraries (Looks rather stupid, but this is the task).

I have found, that I can GZip and UnGZip them using standard libraries. But GZip handles single file only. In such cases TAR should come to the scene. But I haven't found TAR implementation among standard classes.

Maybe it possible decompress TAR with bare C#?

like image 571
shytikov Avatar asked Jan 14 '12 17:01

shytikov


People also ask

What does the tar C command do?

The tar command lets you create compressed archives which contain a particular file or set of files. The resultant archive files are commonly known as tarballs, gzip, bzip, or tar files. A tar file is a special format that groups files into one.

Can 7zip extract tar files?

7-Zip can also be used to unpack many other formats and to create tar files (amongst others).


1 Answers

While looking for a quick answer to the same question, I came across this thread, and was not entirely satisfied with the current answers, as they all point to using third-party dependencies to much larger libraries, all just to achieve simple extraction of a tar.gz file to disk.

While the gz format could be considered rather complicated, tar on the other hand is quite simple. At its core, it just takes a bunch of files, prepends a 500 byte header (but takes 512 bytes) to each describing the file, and writes them all to single archive on a 512 byte alignment. There is no compression, that is typically handled by compressing the created file to a gz archive, which .NET conveniently has built-in, which takes care of all the hard part.

Having looked at the spec for the tar format, there are only really 2 values (especially on Windows) we need to pick out from the header in order to extract the file from a stream. The first is the name, and the second is size. Using those two values, we need only seek to the appropriate position in the stream and copy the bytes to a file.

I made a very rudimentary, down-and-dirty method to extract a tar archive to a directory, and added some helper functions for opening from a stream or filename, and decompressing the gz file first using built-in functions.

The primary method is this:

public static void ExtractTar(Stream stream, string outputDir)
{
    var buffer = new byte[100];
    while (true)
    {
        stream.Read(buffer, 0, 100);
        var name = Encoding.ASCII.GetString(buffer).Trim('\0');
        if (String.IsNullOrWhiteSpace(name))
            break;
        stream.Seek(24, SeekOrigin.Current);
        stream.Read(buffer, 0, 12);
        var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8);

        stream.Seek(376L, SeekOrigin.Current);

        var output = Path.Combine(outputDir, name);
        if (!Directory.Exists(Path.GetDirectoryName(output)))
            Directory.CreateDirectory(Path.GetDirectoryName(output));
        using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
        {
            var buf = new byte[size];
            stream.Read(buf, 0, buf.Length);
            str.Write(buf, 0, buf.Length);
        }

        var pos = stream.Position;

        var offset = 512 - (pos  % 512);
        if (offset == 512)
            offset = 0;

        stream.Seek(offset, SeekOrigin.Current);
    }
}

And here is a few helper functions for opening from a file, and automating first decompressing a tar.gz file/stream before extracting.

public static void ExtractTarGz(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTarGz(stream, outputDir);
}

public static void ExtractTarGz(Stream stream, string outputDir)
{
    // A GZipStream is not seekable, so copy it first to a MemoryStream
    using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
    {
        const int chunk = 4096;
        using (var memStr = new MemoryStream())
        {
            int read;
            var buffer = new byte[chunk];
            do
            {
                read = gzip.Read(buffer, 0, chunk);
                memStr.Write(buffer, 0, read);
            } while (read == chunk);

            memStr.Seek(0, SeekOrigin.Begin);
            ExtractTar(memStr, outputDir);
        }
    }
}

public static void ExtractTar(string filename, string outputDir)
{
    using (var stream = File.OpenRead(filename))
        ExtractTar(stream, outputDir);
}

Here is a gist of the full file with some comments.

like image 54
ForeverZer0 Avatar answered Sep 30 '22 14:09

ForeverZer0