Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading and writing very large text files in C#

I have a very large file, almost 2GB in size. I am trying to write a process to read the file in and write it out without the first row. I pretty much have been only able to read and write one line at a time which takes forever. I can open it, remove the first row and save it faster in TextPad, though that is still very slow.

I use this code to get the number of records in the file:

private long getNumRows(string strFileName)
{
    long lngNumRows = 0;
    string strMsg;

    try
    {
        lngNumRows = 0;
        using (var strReader = File.OpenText(@strFileName))
        {
            while (strReader.ReadLine() != null)
            {
                lngNumRows++;
            }

            strReader.Close();
            strReader.Dispose();
        }
    }
    catch (Exception excExcept)
    {
        strMsg = "The File could not be read: ";
        strMsg += excExcept.Message;
        System.Windows.MessageBox.Show(strMsg);
        //Console.WriteLine("Thee was an error reading the file: ");
        //Console.WriteLine(excExcept.Message);

        //Console.ReadLine();
    }

    return lngNumRows;
}

This only takes seconds to run. When I add the following code it takes forever to run. Am I doing something wrong? Why does the write add so much time? Any ideas on how I can make this faster?

private void ProcessTextFiles(string strFileName)
{
    string strDataLine;
    string strFullOutputFileName;
    string strSubFileName;
    int intPos;
    long lngTotalRows = 0;
    long lngCurrNumRows = 0;
    long lngModNumber = 0;
    double dblProgress = 0;
    double dblProgressPct = 0;
    string strPrgFileName = "";
    string strOutName = "";
    string strMsg;
    long lngFileNumRows;

    try
    {
       using (StreamReader srStreamRdr = new StreamReader(strFileName))
        {
            while ((strDataLine = srStreamRdr.ReadLine()) != null)
            {
                lngCurrNumRows++;

                if (lngCurrNumRows > 1)
                {
                    WriteDataRow(strDataLine, strFullOutputFileName);
                }
            }

            srStreamRdr.Dispose();
        }
    }
    catch (Exception excExcept)
    {
        strMsg = "The File could not be read: ";
        strMsg += excExcept.Message;
        System.Windows.MessageBox.Show(strMsg);
        //Console.WriteLine("The File could not be read:");
        //Console.WriteLine(excExcept.Message);
    }
}

public void WriteDataRow(string strDataRow, string strFullFileName)
{
    //using (StreamWriter file = new StreamWriter(@strFullFileName, true, Encoding.GetEncoding("iso-8859-1")))
    using (StreamWriter file = new StreamWriter(@strFullFileName, true, System.Text.Encoding.UTF8))
    {
        file.WriteLine(strDataRow);
        file.Close();
    }
}
like image 780
Cass Avatar asked Jun 09 '16 11:06

Cass


People also ask

How can I read a large file efficiently?

BufferedReader is used to read the file line by line. Basically, BufferedReader() is used for the processing of large files. BufferedReader is very efficient for reading. Note: Specify the size of the BufferReader or keep that size as a Default size of BufferReader.

Can Notepad ++ handle large files?

What Is the Maximum File Size Notepad++ Can Open? Notepad++ cannot support text files that are larger than 2GB, whereas Notepad, which is the older version, can only handle file sizes below 512MB. The reason for this is Scintilla, which is the core component of Notepad++.

Which text editor is best for large files?

UltraEdit has no real limit on file size - and can easily open, edit, and save large text files in excess of 4 GB!


1 Answers

Not sure how much this will improve the performance, but surely, opening and closing the output file for every line that you want to write is not a good idea.

Instead open both files just one time and then write the line directly

using (StreamWriter file = new StreamWriter(@strFullFileName, true, System.Text.Encoding.UTF8))
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
    while ((strDataLine = srStreamRdr.ReadLine()) != null)
    {
        lngCurrNumRows++;

        if (lngCurrNumRows > 1)
           file.WriteLine(strDataRow);
    }
}

You could also remove the check on lngCurrNumRow simply making an empty read before entering the while loop

strDataLine = srStreamRdr.ReadLine();
if(strDataLine != null)
{
    while ((strDataLine = srStreamRdr.ReadLine()) != null)
    {
           file.WriteLine(strDataRow);
    }
}
like image 91
Steve Avatar answered Sep 23 '22 16:09

Steve