Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split the large text file(32 GB) using C#

Tags:

c#

I tried to split the file about 32GB using the below code but I got the memory exception.

Please suggest me to split the file using C#.

string[] splitFile = File.ReadAllLines(@"E:\\JKS\\ImportGenius\\0.txt");

int cycle = 1;
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
var chunk = splitFile.Take(splitSize);
var rem = splitFile.Skip(splitSize);

while (chunk.Take(1).Count() > 0)
{
    string filename = "file" + cycle.ToString() + ".txt";
    using (StreamWriter sw = new StreamWriter(filename))
    {
        foreach (string line in chunk)
        {
    sw.WriteLine(line);
        }
    }
    chunk = rem.Take(splitSize);
    rem = rem.Skip(splitSize);
    cycle++;
}
like image 965
Jaffer Avatar asked Jul 26 '12 11:07

Jaffer


2 Answers

Well, to start with you need to use File.ReadLines (assuming you're using .NET 4) so that it doesn't try to read the whole thing into memory. Then I'd just keep calling a method to spit the "next" however many lines to a new file:

int splitSize = Convert.ToInt32(txtNoOfLines.Text);
using (var lineIterator = File.ReadLines(...).GetEnumerator())
{
    bool stillGoing = true;
    for (int chunk = 0; stillGoing; chunk++)
    {
        stillGoing = WriteChunk(lineIterator, splitSize, chunk);
    }
}

...

private static bool WriteChunk(IEnumerator<string> lineIterator,
                               int splitSize, int chunk)
{
    using (var writer = File.CreateText("file " + chunk + ".txt"))
    {
        for (int i = 0; i < splitSize; i++)
        {
            if (!lineIterator.MoveNext())
            {
                return false;
            }
            writer.WriteLine(lineIterator.Current);
        }
    }
    return true;
}
like image 112
Jon Skeet Avatar answered Oct 24 '22 06:10

Jon Skeet


Do not read immediately all lines into an array, but use StremReader.ReadLine method, like:

using (StreamReader sr = new StreamReader(@"E:\\JKS\\ImportGenius\\0.txt")) 
{
    while (sr.Peek() >= 0) 
    {
       var fileLine = sr.ReadLine();
       //do something with line
    }
}
like image 38
Tigran Avatar answered Oct 24 '22 06:10

Tigran