I need help converting a VERY LARGE binary file (ZIP file) to a Base64String and back again. The files are too large to be loaded into memory all at once (they throw OutOfMemoryExceptions) otherwise this would be a simple task. I do not want to process the contents of the ZIP file individually, I want to process the entire ZIP file.
The problem:
I can convert the entire ZIP file (test sizes vary from 1 MB to 800 MB at present) to Base64String, but when I convert it back, it is corrupted. The new ZIP file is the correct size, it is recognized as a ZIP file by Windows and WinRAR/7-Zip, etc., and I can even look inside the ZIP file and see the contents with the correct sizes/properties, but when I attempt to extract from the ZIP file, I get: "Error: 0x80004005" which is a general error code.
I am not sure where or why the corruption is happening. I have done some investigating, and I have noticed the following:
If you have a large text file, you can convert it to Base64String incrementally without issue. If calling Convert.ToBase64String
on the entire file yielded: "abcdefghijklmnopqrstuvwx", then calling it on the file in two pieces would yield: "abcdefghijkl" and "mnopqrstuvwx".
Unfortunately, if the file is a binary then the result is different. While the entire file might yield: "abcdefghijklmnopqrstuvwx", trying to process this in two pieces would yield something like: "oiweh87yakgb" and "kyckshfguywp".
Is there a way to incrementally base 64 encode a binary file while avoiding this corruption?
My code:
private void ConvertLargeFile()
{
FileStream inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read);
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
byte[] secondaryBuffer = new byte[buffer.Length];
int secondaryBufferBytesRead = bytesRead;
Array.Copy(buffer, secondaryBuffer, buffer.Length);
bool isFinalChunk = false;
Array.Clear(buffer, 0, buffer.Length);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
if(bytesRead == 0)
{
isFinalChunk = true;
buffer = new byte[secondaryBufferBytesRead];
Array.Copy(secondaryBuffer, buffer, buffer.length);
}
String base64String = Convert.ToBase64String(isFinalChunk ? buffer : secondaryBuffer);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
}
inputStream.Dispose();
}
The decoding is more of the same. I use the size of the base64String
variable above (which varies depending on the original buffer size that I test with), as the buffer size for decoding. Then, instead of Convert.ToBase64String()
, I call Convert.FromBase64String()
and write to a different file name/path.
EDIT:
In my haste to reduce the code (I refactored it into a new project, separate from other processing to eliminate code that isn't central to the issue) I introduced a bug. The base 64 conversion should be performed on the secondaryBuffer
for all iterations save the last (Identified by isFinalChunk
), when buffer
should be used. I have corrected the code above.
EDIT #2:
Thank you all for your comments/feedback. After correcting the bug (see the above edit), I re-tested my code, and it is actually working now. I intend to test and implement @rene's solution as it appears to be the best, but I thought that I should let everyone know of my discovery as well.
Based on the code shown in the blog from Wiktor Zychla the following code works. This same solution is indicated in the remarks section of Convert.ToBase64String as pointed out by Ivan Stoev
// using System.Security.Cryptography
private void ConvertLargeFile()
{
//encode
var filein= @"C:\Users\test\Desktop\my.zip";
var fileout = @"C:\Users\test\Desktop\Base64Zip";
using (FileStream fs = File.Open(fileout, FileMode.Create))
using (var cs=new CryptoStream(fs, new ToBase64Transform(),
CryptoStreamMode.Write))
using(var fi =File.Open(filein, FileMode.Open))
{
fi.CopyTo(cs);
}
// the zip file is now stored in base64zip
// and decode
using (FileStream f64 = File.Open(fileout, FileMode.Open) )
using (var cs=new CryptoStream(f64, new FromBase64Transform(),
CryptoStreamMode.Read ) )
using(var fo =File.Open(filein +".orig", FileMode.Create))
{
cs.CopyTo(fo);
}
// the original file is in my.zip.orig
// use the commandlinetool
// fc my.zip my.zip.orig
// to verify that the start file and the encoded and decoded file
// are the same
}
The code uses standard classes found in System.Security.Cryptography namespace and uses a CryptoStream
and the FromBase64Transform
and its counterpart ToBase64Transform
You can avoid using a secondary buffer by passing offset and length to Convert.ToBase64String
, like this:
private void ConvertLargeFile()
{
using (var inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
String base64String = Convert.ToBase64String(buffer, 0, bytesRead);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}
The above should work, but I think Rene's answer is actually the better solution.
Use this code:
public void ConvertLargeFile(string source , string destination)
{
using (FileStream inputStream = new FileStream(source, FileMode.Open, FileAccess.Read))
{
int buffer_size = 30000; //or any multiple of 3
byte[] buffer = new byte[buffer_size];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
byte[] buffer2 = buffer;
if(bytesRead < buffer_size)
{
buffer2 = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, buffer2, 0, bytesRead);
}
string base64String = System.Convert.ToBase64String(buffer2);
File.AppendAllText(destination, base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With