I found 7-zip great and I will like to use it on .net applications. I have a 10MB file (a.001) and it takes:
2 seconds to encode.
Now it will be nice if I could do the same thing on c#. I have downloaded http://www.7-zip.org/sdk.html LZMA SDK c# source code. I basically copied the CS directory into a console application in visual studio:
Then I compiled and eveything compiled smoothly. So on the output directory I placed the file a.001
which is 10MB of size. On the main method that came on the source code I placed:
[STAThread] static int Main(string[] args) { // e stands for encode args = "e a.001 output.7z".Split(' '); // added this line for debug try { return Main2(args); } catch (Exception e) { Console.WriteLine("{0} Caught exception #1.", e); // throw e; return 1; } }
when I execute the console application the application works great and I get the output a.7z
on the working directory. The problem is that it takes so long. It takes about 15 seconds to execute! I have also tried https://stackoverflow.com/a/8775927/637142 approach and it also takes very long. Why is it 10 times slower than the actual program ?
Even if I set to use only one thread:
It still takes much less time (3 seconds vs 15):
Could it be because C# is slower than assembly or C ? I notice that the algorithm does a lot of heavy operations. For example compare these two blocks of code. They both do the same thing:
#include <time.h> #include<stdio.h> void main() { time_t now; int i,j,k,x; long counter ; counter = 0; now = time(NULL); /* LOOP */ for(x=0; x<10; x++) { counter = -1234567890 + x+2; for (j = 0; j < 10000; j++) for(i = 0; i< 1000; i++) for(k =0; k<1000; k++) { if(counter > 10000) counter = counter - 9999; else counter= counter +1; } printf (" %d \n", time(NULL) - now); // display elapsed time } printf("counter = %d\n\n",counter); // display result of counter printf ("Elapsed time = %d seconds ", time(NULL) - now); gets("Wait"); }
output
static void Main(string[] args) { DateTime now; int i, j, k, x; long counter; counter = 0; now = DateTime.Now; /* LOOP */ for (x = 0; x < 10; x++) { counter = -1234567890 + x + 2; for (j = 0; j < 10000; j++) for (i = 0; i < 1000; i++) for (k = 0; k < 1000; k++) { if (counter > 10000) counter = counter - 9999; else counter = counter + 1; } Console.WriteLine((DateTime.Now - now).Seconds.ToString()); } Console.Write("counter = {0} \n", counter.ToString()); Console.Write("Elapsed time = {0} seconds", DateTime.Now - now); Console.Read(); }
Output
Note how much slower was c#. Both programs where run from outside visual studio on release mode. Maybe that is the reason why it takes so much longer in .net than on c++.
I cannot seem to know what is causing the problem. I guess I will use 7z.dll and invoke the necessary methods from c#. A library that does that is at: http://sevenzipsharp.codeplex.com/ and that way I am using the same library that 7zip is using as:
// dont forget to add reference to SevenZipSharp located on the link I provided static void Main(string[] args) { // load the dll SevenZip.SevenZipCompressor.SetLibraryPath(@"C:\Program Files (x86)\7-Zip\7z.dll"); SevenZip.SevenZipCompressor compress = new SevenZip.SevenZipCompressor(); compress.CompressDirectory("MyFolderToArchive", "output.7z"); }
7-Zip uses all CPU cores of your system. Compressing large files will cause your CPU to run constantly with 100% on all cores. Therefore the heat is accumulating and may overload your cooling system after a certain time which forces the CPU to run slower and slower...
The LZMA SDK provides the documentation, samples, header files, libraries, and tools you need to develop applications that use LZMA compression.
I ran a profiler on the code, and the most expensive operation appears to be in searching for matches. In C#, it's searching a single byte at a time. There are two functions (GetMatches and Skip) in LzBinTree.cs which contain the following code snippet, and it spends something like 40-60% of its time on this code:
if (_bufferBase[pby1 + len] == _bufferBase[cur + len]) { while (++len != lenLimit) if (_bufferBase[pby1 + len] != _bufferBase[cur + len]) break;
It's basically trying to find the match length a single byte at a time. I extracted that into its own method:
if (GetMatchLength(lenLimit, cur, pby1, ref len)) {
And if you use unsafe code and cast the byte* to a ulong* and compare 8 bytes at a time instead of 1, the speed almost doubled for my test data (in a 64 bit process):
private bool GetMatchLength(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len) { if (_bufferBase[pby1 + len] != _bufferBase[cur + len]) return false; len++; // This method works with or without the following line, but with it, // it runs much much faster: GetMatchLengthUnsafe(lenLimit, cur, pby1, ref len); while (len != lenLimit && _bufferBase[pby1 + len] == _bufferBase[cur + len]) { len++; } return true; } private unsafe void GetMatchLengthUnsafe(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len) { const int size = sizeof(ulong); if (lenLimit < size) return; lenLimit -= size - 1; fixed (byte* p1 = &_bufferBase[cur]) fixed (byte* p2 = &_bufferBase[pby1]) { while (len < lenLimit) { if (*((ulong*)(p1 + len)) == *((ulong*)(p2 + len))) { len += size; } else return; } } }
This kind of binary-arithmetic and branching-heavy code is what C-compilers love and what the .NET JIT hates. The .NET JIT is not a very smart compiler. It is optimized for fast compilation. If Microsoft wanted to tune it for maximum performance they would plug in the VC++ backend, but then intentionally don't.
Also, I can tell by the speed you are getting with 7z.exe (6MB/s) that you are using multiple cores, probably using LZMA2. My fast core i7 can deliver 2MB/s per core so I guess 7z.exe is running multi-threaded for you. Try turning on threading in the 7zip-library if that is possible.
I recommend that instead of using the managed code LZMA-algorithm you either use a natively compiled library or call 7z.exe using Process.Start
. The latter one should get you started very quickly with good results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With