Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SharpCompress & LZMA2 7z archive - very slow extraction of specific file. Why? Alternatives?

I have a 7zip archive craeted with LZMA2 compression (compression level: ultra). The archive contains 1,749 files, which in total originally had a size of 661mb. The zipped file is 39mb in size.

Now I'm trying to use C# to extract a tiny (~200kb'ish) single file from this archive.

I'm getting the corresponding IArchiveEntry from the IArchive (which works relatively fast), but then calling IArchiveEntry.WriteToFile(targetPath) takes around 33 seconds! And similarly long if I write to a memory stream instead. (edit: When I'm running this on a 7z LZMA2 archive with compression level = normal, it still takes 9 seconds)

When I'm opening the same archive in the actual 7zip application and extract the same file from there, it takes around 2-3 seconds only. I suspected it's some sort of multicore (7zip) vs single core (SharpCompress probably?) thing, but I don't notice any CPU usage spike during decompression with 7zip.. maybe its too fast to be noticeable though..

Does anyone know what could be the issue for such slow speeds with SharpCompress? Am I maybe missing some setting or using a wrong factory (ArchiveFactory) ?

If not - is there any C# library out there that might be significantly faster at decompressing this?

For reference, here's a sketch of how I'm using SharpCompress to extract:

private void Extract()
    {
        using(var archive = GetArchive())
        {
          var entryPath = /* ... path to entry .. */
          var entry = TryGetEntry(archive, entryPath);
          entry.WriteToFile(some_target_path);
        }
    }


    private IArchive GetArchive()
    {
        string path = /* .. path to my .7z file */;
        return ArchiveFactory.Open(path);
    }

    private IArchiveEntry TryGetEntry(IArchive archive, string path)
    {
        path = path.Replace("\\", "/");

        foreach (var entry in archive.Entries)
        {
            if (!entry.IsDirectory)
            {
                if (entry.Key == path)
                    return entry;
            }
        }

        return null;
    }

Update: For a temporary solution, I'm now including the 7zr.exe from the 7zip SDK in my application, and run this in a new process to extract a single file, reading the process' output into a binary stream. This works in around ~3 seconds compared to the ~33seconds with SharpCompress. Works for now, but kind of ugly.. so still curious why SharpCompress seems to be so slow there

like image 237
Bogey Avatar asked Apr 25 '17 09:04

Bogey


People also ask

What is SharpCompress?

SharpCompress is an open source pure . NET library that enables software developers to work with popular compression file formats like RAR, 7ZAP, ZIP, TAR, GZIP, BZIP2 & more.

What is SharpCompress DLL?

SharpCompress is a compression library for NET Standard 2.0/2.1/NET 5.0 that can unrar, decompress 7zip, decompress xz, zip/unzip, tar/untar lzip/unlzip, bzip2/unbzip2 and gzip/ungzip with forward-only reading and file random access APIs. Write support for zip/tar/bzip2/gzip is implemented.


1 Answers

This line is the problem

foreach (var entry in archive.Entries)

The problem is described here (ie. If there are 100 files, it decompresses the 1st file 100 times, 2nd file 99 times, and so on)

You need to use reader (forward-only). See the API.
But the sample code there doesn't support 7z.

For 7z you can use archive.ExtractAllEntries(), eg.

var reader = archive.ExtractAllEntries();
while (reader.MoveToNextEntry())
{
    if (!reader.Entry.IsDirectory)
        reader.WriteEntryToDirectory(extractDir, new ExtractionOptions() { ExtractFullPath = false, Overwrite = true });
}

It will be much faster.

like image 148
Aximili Avatar answered Nov 15 '22 05:11

Aximili