why is LZMA SDK (7-zip) so slow

Tags:

I found 7-zip great and I will like to use it on .net applications. I have a 10MB file (a.001) and it takes:

enter image description here

2 seconds to encode.

Now it will be nice if I could do the same thing on c#. I have downloaded http://www.7-zip.org/sdk.html LZMA SDK c# source code. I basically copied the CS directory into a console application in visual studio: enter image description here

Then I compiled and eveything compiled smoothly. So on the output directory I placed the file a.001 which is 10MB of size. On the main method that came on the source code I placed:

[STAThread]     static int Main(string[] args)     {         // e stands for encode         args = "e a.001 output.7z".Split(' '); // added this line for debug          try         {             return Main2(args);         }         catch (Exception e)         {             Console.WriteLine("{0} Caught exception #1.", e);             // throw e;             return 1;         }     }

when I execute the console application the application works great and I get the output a.7z on the working directory. The problem is that it takes so long. It takes about 15 seconds to execute! I have also tried https://stackoverflow.com/a/8775927/637142 approach and it also takes very long. Why is it 10 times slower than the actual program ?

Also

Even if I set to use only one thread: enter image description here

It still takes much less time (3 seconds vs 15):

(Edit) Another Possibility

Could it be because C# is slower than assembly or C ? I notice that the algorithm does a lot of heavy operations. For example compare these two blocks of code. They both do the same thing:

C

#include <time.h> #include<stdio.h>  void main() {     time_t now;       int i,j,k,x;     long counter ;      counter = 0;      now = time(NULL);      /* LOOP  */     for(x=0; x<10; x++)     {         counter = -1234567890 + x+2;          for (j = 0; j < 10000; j++)                  for(i = 0; i< 1000; i++)                                 for(k =0; k<1000; k++)                 {                     if(counter > 10000)                         counter = counter - 9999;                     else                         counter= counter +1;                 }          printf (" %d  \n", time(NULL) - now); // display elapsed time     }       printf("counter = %d\n\n",counter); // display result of counter              printf ("Elapsed time = %d seconds ", time(NULL) - now);     gets("Wait"); }

output

enter image description here

c#

static void Main(string[] args) {            DateTime now;      int i, j, k, x;     long counter;      counter = 0;      now = DateTime.Now;      /* LOOP  */     for (x = 0; x < 10; x++)     {         counter = -1234567890 + x + 2;          for (j = 0; j < 10000; j++)                         for (i = 0; i < 1000; i++)                                 for (k = 0; k < 1000; k++)                 {                     if (counter > 10000)                         counter = counter - 9999;                     else                         counter = counter + 1;                 }           Console.WriteLine((DateTime.Now - now).Seconds.ToString());                 }      Console.Write("counter = {0} \n", counter.ToString());     Console.Write("Elapsed time = {0} seconds", DateTime.Now - now);     Console.Read(); }

Output

enter image description here

Note how much slower was c#. Both programs where run from outside visual studio on release mode. Maybe that is the reason why it takes so much longer in .net than on c++.

Also I got the same results. C# was 3 times slower just like on the example I just showed!

Conclusion

I cannot seem to know what is causing the problem. I guess I will use 7z.dll and invoke the necessary methods from c#. A library that does that is at: http://sevenzipsharp.codeplex.com/ and that way I am using the same library that 7zip is using as:

    // dont forget to add reference to SevenZipSharp located on the link I provided     static void Main(string[] args)     {         // load the dll         SevenZip.SevenZipCompressor.SetLibraryPath(@"C:\Program Files (x86)\7-Zip\7z.dll");          SevenZip.SevenZipCompressor compress = new SevenZip.SevenZipCompressor();          compress.CompressDirectory("MyFolderToArchive", "output.7z");       }

391

asked Sep 06 '12 03:09

Tono Nam

2 Answers

I ran a profiler on the code, and the most expensive operation appears to be in searching for matches. In C#, it's searching a single byte at a time. There are two functions (GetMatches and Skip) in LzBinTree.cs which contain the following code snippet, and it spends something like 40-60% of its time on this code:

if (_bufferBase[pby1 + len] == _bufferBase[cur + len]) {     while (++len != lenLimit)         if (_bufferBase[pby1 + len] != _bufferBase[cur + len])             break;

It's basically trying to find the match length a single byte at a time. I extracted that into its own method:

if (GetMatchLength(lenLimit, cur, pby1, ref len)) {

And if you use unsafe code and cast the byte* to a ulong* and compare 8 bytes at a time instead of 1, the speed almost doubled for my test data (in a 64 bit process):

private bool GetMatchLength(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len) {     if (_bufferBase[pby1 + len] != _bufferBase[cur + len])         return false;     len++;      // This method works with or without the following line, but with it,     // it runs much much faster:     GetMatchLengthUnsafe(lenLimit, cur, pby1, ref len);      while (len != lenLimit         && _bufferBase[pby1 + len] == _bufferBase[cur + len])     {         len++;     }     return true; }  private unsafe void GetMatchLengthUnsafe(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len) {     const int size = sizeof(ulong);     if (lenLimit < size)         return;     lenLimit -= size - 1;     fixed (byte* p1 = &_bufferBase[cur])     fixed (byte* p2 = &_bufferBase[pby1])     {         while (len < lenLimit)         {             if (*((ulong*)(p1 + len)) == *((ulong*)(p2 + len)))             {                 len += size;             }             else                 return;         }     } }

answered Oct 12 '22 22:10

Bryce Wagner

This kind of binary-arithmetic and branching-heavy code is what C-compilers love and what the .NET JIT hates. The .NET JIT is not a very smart compiler. It is optimized for fast compilation. If Microsoft wanted to tune it for maximum performance they would plug in the VC++ backend, but then intentionally don't.

Also, I can tell by the speed you are getting with 7z.exe (6MB/s) that you are using multiple cores, probably using LZMA2. My fast core i7 can deliver 2MB/s per core so I guess 7z.exe is running multi-threaded for you. Try turning on threading in the 7zip-library if that is possible.

I recommend that instead of using the managed code LZMA-algorithm you either use a natively compiled library or call 7z.exe using Process.Start. The latter one should get you started very quickly with good results.

answered Oct 12 '22 20:10

usr

Related questions
                            
                                Finding my way through Scalaz [duplicate]
                            
                                Force AutoFixture to use the greediest constructor
                            
                                How to get Visual Studio performance tools to load symbols for "System.Core.ni.dll"
                            
                                Read and Write to Java file via Resource
                            
                                How do I compile for Windows XP with Visual Studio 2012?
                            
                                How can I reinstall a cassandra on ubuntu?
                            
                                Get Twitter Feed as JSON without authentication
                            
                                Visual studio 2012 slow unit testing
                            
                                Apache Multiple Sub Domains With One IP Address
                            
                                Visual Studio design view contains yellow warning icons?
                            
                                Testing error pages in Rails with Rspec + Capybara
                            
                                Rounded side, not rounded corners

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why is LZMA SDK (7-zip) so slow

Tags:

Also

(Edit) Another Possibility

C

c#

Also I got the same results. C# was 3 times slower just like on the example I just showed!

Conclusion

Tono Nam

People also ask

2 Answers

Bryce Wagner

usr

Recent Activity

Donate For Us