I have used the below code to split the string, but it takes a lot of time.
using (StreamReader srSegmentData = new StreamReader(fileNamePath))
{
string strSegmentData = "";
string line = srSegmentData.ReadToEnd();
int startPos = 0;
ArrayList alSegments = new ArrayList();
while (startPos < line.Length && (line.Length - startPos) >= segmentSize)
{
strSegmentData = strSegmentData + line.Substring(startPos, segmentSize) + Environment.NewLine;
alSegments.Add(line.Substring(startPos, segmentSize) + Environment.NewLine);
startPos = startPos + segmentSize;
}
}
Please suggest me an alternative way to split the string into smaller chunks of fixed size
To split a file into pieces, you simply use the split command. By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.
To split large files into small pieces, we use the split command in the Linux operating system. The split command is used to split or break large files into small pieces in the Linux system. By default, it generates output files of a fixed size, the default lines are 1000 and the default prefix would be 'x'.
To split a big binary file in multiple files, you should first read the file by the size of chunk you want to create, then write that chunk to a file, read the next chunk and repeat until you reach the end of original file.
First of all you should define what you mean with chunk size. If you mean chunks with a fixed number of code units then your actual algorithm may be slow but it works. If it's not what you intend and you actually mean chunks with a fixed number of characters then it's broken. I discussed a similar issue in this Code Review post: Split a string into chunks of the same length then I will repeat here only relevant parts.
You're partitioning over Char
but String
is UTF-16 encoded then you may produce broken strings in, at least, three cases:
"dž".Length > 1
. More about this and other cultural issues on How can I perform a Unicode aware character by character comparison?.One proposed (and untested) implementation may be this:
public static IEnumerable<string> Split(this string value, int desiredLength)
{
var characters = StringInfo.GetTextElementEnumerator(value);
while (characters.MoveNext())
yield return String.Concat(Take(characters, desiredLength));
}
private static IEnumerable<string> Take(TextElementEnumerator enumerator, int count)
{
for (int i = 0; i < count; ++i)
{
yield return (string)enumerator.Current;
if (!enumerator.MoveNext())
yield break;
}
}
It's not optimized for speed (as you can see I tried to keep code short and clear using enumerations) but, for big files, it still perform better than your implementation (see next paragraph for the reason).
About your code note that:
ArrayList
(?!) to hold result. Also note that in this way you resize ArrayList
multiple times (even if, given input size and chunk size then its final size is known).strSegmentData
is rebuilt multiple times, if you need to accumulate characters you must use StringBuilder
otherwise each operation will allocate a new string and copying old value (it's slow and it also adds pressure to Garbage Collector).There are faster implementations (see linked Code Review post, especially Heslacher's implementation for a much faster version) and if you do not need to handle Unicode correctly (you're sure you manage only US ASCII characters) then there is also a pretty readable implementation from Jon Skeet (note that, after profiling your code, you may still improve its performance for big files pre-allocating right size output list). I do not repeat their code here then please refer to linked posts.
In your specific you do not need to read entire huge file in memory, you can read/parse n characters at time (don't worry too much about disk access, I/O is buffered). It will slightly degrade performance but it will greatly improve memory usage. Alternatively you can read line by line (managing to handle cross-line chunks).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With