Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out-of-memory error while reading very large text file in vb.net

I've been tasked with processing a 3.2GB fixed-width delimited text file. Each line is 1563 chars long, and there are approximately 2.1 million lines in the text file. After reading about 1 million lines, my program crashes with an out-of-memory exception error.

Imports System.IO
Imports Microsoft.VisualBasic.FileIO

Module TestFileCount
    ''' <summary>
    ''' Gets the total number of lines in a text file by reading a line at a time
    ''' </summary>
    ''' <remarks>Crashes when count reaches 1018890</remarks>
    Sub Main()
        Dim inputfile As String = "C:\Split\BIGFILE.txt"
        Dim count As Int32 = 0
        Dim lineoftext As String = ""

        If File.Exists(inputfile) Then
            Dim _read As New StreamReader(inputfile)
            Try
                While (_read.Peek <> -1)
                    lineoftext = _read.ReadLine()
                    count += 1
                End While

                Console.WriteLine("Total Lines in " & inputfile & ": " & count)
            Catch ex As Exception
                Console.WriteLine(ex.Message)
            Finally
                _read.Close()
            End Try
        End If
    End Sub
End Module

It's a pretty straightforward program that reads the text file one line at a time, so I assume it shouldn't take up too much memory in the buffer.

For the life of me, I can't figure out why it's crashing. Does anyone here have any ideas?

like image 746
Spacehamster Avatar asked Nov 12 '22 12:11

Spacehamster


1 Answers

I don't know if this will fix your problem but don't use peek, change your loop to: (this is C# but you should be able to translate it to VB)

while (_read.ReadLine() != null)
{
    count += 1
}

If you need to use the line of text inside the loop instead of just counting lines just modify the code to

while ((lineoftext = _read.ReadLine()) != null)
{
    count += 1
    //Do something with lineoftext
}

Kind of off topic and kind of cheating, if each line really is 1563 chars long (including the line ending) and the file is pure ASCII (so all chars take up one byte) you could just do (once again C# but you should be able to translate)

long bytesPerLine = 1563;
string inputfile = @"C:\Split\BIGFILE.txt"; //The @ symbol is so we don't have to escape the `\`
long length;

using(FileStream stream = File.Open(inputFile, FileMode.Open)) //This is the C# equivilant of the try/finally to close the stream when done.
{
    length = stream.Length;
}

Console.WriteLine("Total Lines in {0}: {1}", inputfile, (length / bytesPerLine ));
like image 165
Scott Chamberlain Avatar answered Nov 15 '22 05:11

Scott Chamberlain