Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split large files efficiently

Tags:

c#

.net

I'd like to know how I can split a large file without using too many system resources. I'm currently using this code:

public static void SplitFile(string inputFile, int chunkSize, string path) {     byte[] buffer = new byte[chunkSize];      using (Stream input = File.OpenRead(inputFile))     {         int index = 0;         while (input.Position < input.Length)         {             using (Stream output = File.Create(path + "\\" + index))             {                 int chunkBytesRead = 0;                 while (chunkBytesRead < chunkSize)                 {                     int bytesRead = input.Read(buffer,                                                 chunkBytesRead,                                                 chunkSize - chunkBytesRead);                      if (bytesRead == 0)                     {                         break;                     }                     chunkBytesRead += bytesRead;                 }                 output.Write(buffer, 0, chunkBytesRead);             }             index++;         }     } } 

The operation takes 52.370 seconds to split a 1.6GB file into 14mb files. I'm not concerned about how long the operation takes, I'm more concerned about the system resource used as this app will be deployed to a shared hosting environment. Currently this operation max's out my systems HDD IO usage at 100%, and slows my system down considerably. CPU usage is low; RAM ramps up a bit, but seems fine.

Is there a way I can restrict this operation from using too many resources?

Thanks

like image 457
Bruce Adams Avatar asked Oct 19 '10 10:10

Bruce Adams


People also ask

How do I split large files?

First up, right-click the file you want to split into smaller pieces, then select 7-Zip > Add to Archive. Give your archive a name. Under Split to Volumes, bytes, input the size of split files you want. There are several options in the dropdown menu, although they may not correspond to your large file.

How do I split a large file into smaller parts in Windows?

Right-click the file and select the Split operation from the program's context menu. This opens a new configuration window where you need to specify the destination for the split files and the maximum size of each volume. You can select one of the pre-configured values or enter your own into the form directly.

How do I split a file into parts?

To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.


2 Answers

It seems odd to assemble each output file in memory; I suspect you should be running an inner buffer (maybe 20k or something) and calling Write more frequently.

Ultimately, if you need IO, you need IO. If you want to be courteous to a shared hosting environment you could add deliberate pauses - maybe short pauses within the inner loop, and a longer pause (maybe 1s) in the outer loop. This won't affect your overall timing much, but may help other processes get some IO.

Example of a buffer for the inner-loop:

public static void SplitFile(string inputFile, int chunkSize, string path) {     const int BUFFER_SIZE = 20 * 1024;     byte[] buffer = new byte[BUFFER_SIZE];      using (Stream input = File.OpenRead(inputFile))     {         int index = 0;         while (input.Position < input.Length)         {             using (Stream output = File.Create(path + "\\" + index))             {                 int remaining = chunkSize, bytesRead;                 while (remaining > 0 && (bytesRead = input.Read(buffer, 0,                         Math.Min(remaining, BUFFER_SIZE))) > 0)                 {                     output.Write(buffer, 0, bytesRead);                     remaining -= bytesRead;                 }             }             index++;             Thread.Sleep(500); // experimental; perhaps try it         }     } } 
like image 189
Marc Gravell Avatar answered Sep 28 '22 04:09

Marc Gravell


I have modified the code in the question a bit in case you wanted to split by chunks while making sure each chunk ends on a line ending:

    private static void SplitFile(string inputFile, int chunkSize, string path)     {         byte[] buffer = new byte[chunkSize];         List<byte> extraBuffer = new List<byte>();          using (Stream input = File.OpenRead(inputFile))         {             int index = 0;             while (input.Position < input.Length)             {                 using (Stream output = File.Create(path + "\\" + index + ".csv"))                 {                     int chunkBytesRead = 0;                     while (chunkBytesRead < chunkSize)                     {                         int bytesRead = input.Read(buffer,                                                    chunkBytesRead,                                                    chunkSize - chunkBytesRead);                          if (bytesRead == 0)                         {                             break;                         }                          chunkBytesRead += bytesRead;                     }                      byte extraByte = buffer[chunkSize - 1];                     while (extraByte != '\n')                     {                         int flag = input.ReadByte();                         if (flag == -1)                             break;                         extraByte = (byte)flag;                         extraBuffer.Add(extraByte);                     }                      output.Write(buffer, 0, chunkBytesRead);                     if (extraBuffer.Count > 0)                         output.Write(extraBuffer.ToArray(), 0, extraBuffer.Count);                      extraBuffer.Clear();                 }                 index++;             }         }     } 
like image 35
Michael Bahig Avatar answered Sep 28 '22 03:09

Michael Bahig