Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better way to determine the number of lines in a large txt file(1-2 GB)? [duplicate]

Tags:

c#

.net

I am trying to count all the lines in a txt file, I am using the StreamReader:

public int countLines(string path)
{
    var watch = System.Diagnostics.Stopwatch.StartNew();
    int nlines=0;
    string line;
    StreamReader file = new StreamReader(path);
    while ((line = file.ReadLine()) != null)
    {
        nlines++;
    }
    watch.Stop();
    var elapsedMs = watch.ElapsedMilliseconds;
    Console.Write(elapsedMs)
    // elapsedMs = 3520  --- Tested with a 1.2 Mill txt
    return nlines;
}

Is there a more efficient way to count the number of lines?

like image 224
Brayan Henao Avatar asked Apr 01 '16 23:04

Brayan Henao


People also ask

How do you count lines in a text file?

The command “wc” basically means “word count” and with different optional parameters one can use it to count the number of lines, words, and characters in a text file. Using wc with no options will get you the counts of bytes, lines, and words (-c, -l and -w option).

How do you count the number of lines in a text file in Python?

Use readlines() to get Line Count This is the most straightforward way to count the number of lines in a text file in Python. The readlines() method reads all lines from a file and stores it in a list. Next, use the len() function to find the length of the list which is nothing but total lines present in a file.

How do I count the number of lines in a file without opening the file?

If you are in *Nix system, you can call the command wc -l that gives the number of lines in file.

How do I count the number of lines in a text file C#?

The simplest way to get the number of lines in a text file is to combine the File. ReadLines method with System. Linq. Enumerable.


1 Answers

You already have the appropriate solution but you can simplify all your code to:

var lineCount = File.ReadLines(@"C:\MyHugeFile.txt").Count();

Benchmarks

I am not sure how dreamlax achieved his benchmark results but here is something so that anyone can reproduce on their machine; you can just copy-paste into LINQPad.

First let us prepare our input file:

var filePath = @"c:\MyHugeFile.txt";

for (int counter = 0; counter < 5; counter++)
{
    var lines = new string[30000000];

    for (int i = 0; i < lines.Length; i++)
    {
        lines[i] = $"This is a line with a value of: {i}";
    }

    File.AppendAllLines(filePath, lines);
}

This should produce a 150 million lines file which is roughly 6 GB.

Now let us run each method:

void Main()
{
    var filePath = @"c:\MyHugeFile.txt";
    // Make sure you clear windows cache!
    UsingFileStream(filePath);

    // Make sure you clear windows cache!
    UsingStreamReaderLinq(filePath);

    // Make sure you clear windows cache!
    UsingStreamReader(filePath);
}

private void UsingFileStream(string path)
{
    var sw = Stopwatch.StartNew();
    using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    {
        long lineCount = 0;
        byte[] buffer = new byte[1024 * 1024];
        int bytesRead;

        do
        {
            bytesRead = fs.Read(buffer, 0, buffer.Length);
            for (int i = 0; i < bytesRead; i++)
                if (buffer[i] == '\n')
                    lineCount++;
        }
        while (bytesRead > 0);       
        Console.WriteLine("[FileStream] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
    }
}

private void UsingStreamReaderLinq(string path)
{
    var sw = Stopwatch.StartNew();
    var lineCount = File.ReadLines(path).Count();
    Console.WriteLine("[StreamReader+LINQ] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
}

private void UsingStreamReader(string path)
{
    var sw = Stopwatch.StartNew();
    long lineCount = 0;
    string line;
    using (var file = new StreamReader(path))
    {
        while ((line = file.ReadLine()) != null) { lineCount++; }
        Console.WriteLine("[StreamReader] - Read: {0:n0} in {1}", lineCount, sw.Elapsed);
    }
}

Which results in:

[FileStream] - Read: 150,000,000 in 00:00:37.3397443

[StreamReader+LINQ] - Read: 150,000,000 in 00:00:33.8842190

[StreamReader] - Read: 150,000,000 in 00:00:34.2102178

Update

Running with optimization ON results in:

[FileStream] - Read: 150,000,000 in 00:00:18.1636374

[StreamReader+LINQ] - Read: 150,000,000 in 00:00:33.3173354

[StreamReader] - Read: 150,000,000 in 00:00:32.3530890

like image 196
MaYaN Avatar answered Nov 11 '22 23:11

MaYaN