Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of Memory Exception when handling large files in C#

I have a C# winforms application, wherein I use an OpenFileDialog to let the users choose text files to open.(Multiple Selection allowed) Once they select the text files, I open the files one by one, get the text and store the contents in a List variable by using the List.Add() operation.

My problem occurs when the user selects an unusually large amount of text files like 1264 text files total size up to 750MB, the program is unable to handle it. It reads up to some 850 files then gives me an out of memory exception. In task manager my application's memory(private working set) is around 1.5GB when this occurs. I use x64 machine with 32GB ram.

I am giving the code which reads through the files:

public static List<LoadData> LoadDataFromFile(string[] filenames)
{
    List<LoadData> MasterData = new List<LoadData>();
    lookingForJobs = new LookingForJobs(1,filenames.Length);
    lookingForJobs.Show();
    /*-------OUTER LOOP TO GO THROUGH ALL THE FILES-------*/
    for (int index = 0; index < filenames.Length; index++)
    {
        string path = filenames[index];
        /*----------INNER LOOP TO GO THROUGH THE CONTENTS OF EACH FILE------*/
        foreach (string line in File.ReadAllLines(path))
        {
            string[] columns = line.Split('\t');
            if (columns.Length == 9)
            {
                if (line.StartsWith("<"))    /*-------IGNORING THE FIRST 8 LINES OF EACH LOG FILE CONTAINING THE LOGGER INFO---------*/
                {
                    MasterData.Add(new LoadData
                    {
                        Event_Type = columns[0],
                        Timestamp = columns[1],
                        Log_Message = columns[2],
                        Category = columns[3],
                        User = columns[4],
                        Thread_ID = columns[5],
                        Error_Code = columns[6],
                        Application = columns[7],
                        Machine = columns[8]
                    });
                }
            }
        }
        lookingForJobs.SearchingForJobsProgress.PerformStep();
        /*--------END OF INNER LOOP--------*/
    }
    lookingForJobs.Dispose();
    /*-----------END OF OUTER LOOP-----*/
    return MasterData;
}

Edit: I understand that I should possibly redesign my code so that not all the files are read into the object at once. But, I want to know if there is any limit to the size of the list object or the memory(private working set). I read in a few articles that sometime when you hit 1.5-1.6 GB these kinds of problems occur.

like image 286
Kaushik Avatar asked Jan 22 '26 09:01

Kaushik


1 Answers

Use File.ReadLines instead of File.ReadAllLines as the second is unnecessarily loading all file into memory, while you need only one line at once. MSDN says:

When you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

This will give you probably quite big memory improvement.

The second thought is to rethink if you really need so big data stored in memory. Maybe you may just store filepath to each file and read them at demand.

like image 55
Konrad Kokosa Avatar answered Jan 23 '26 23:01

Konrad Kokosa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!