Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any tricks for counting the number of lines in a text file? [closed]

Say you have a text file - what's the fastest and/or most memory efficient way to determine the number of lines of text in that file?

Is it simply a matter of scanning through it character by character and looking for newline characters?

like image 615
xyz Avatar asked Oct 09 '09 18:10

xyz


People also ask

How do I count the number of lines in a text file?

In notepad , you can type Ctrl + g to view current line number. It also at bottom-right corner of status-bar.

How do I count the number of lines in a text file in Windows?

wc command - The wc (word count) command is one of the easiest and fastest methods of getting the amount of characters, lines, and words in a file.

How can you read last 10 lines of a big file?

To look at the last few lines of a file, use the tail command. tail works the same way as head: type tail and the filename to see the last 10 lines of that file, or type tail -number filename to see the last number lines of the file.

How do you count the number of lines in a text file in Python?

Use readlines() to get Line Count This is the most straightforward way to count the number of lines in a text file in Python. The readlines() method reads all lines from a file and stores it in a list. Next, use the len() function to find the length of the list which is nothing but total lines present in a file.


2 Answers

Probably not the fastest but it will be the most versatile...

int lines = 0;
/* if you need to use an encoding other than UTF-8 you way want to try...
   new StreamReader("filename.txt", yourEncoding) 
   ... instead of File.OpenText("myFile.txt")
*/
using (var fs = File.OpenText("myFile.txt"))
    while (!fs.EndOfStream)
    {
        fs.ReadLine();
        lines++;
    }

... this will probably be faster ...

if you need even more speed you might try a Duff's device and check 10 or 20 bytes before the branch

int lines = 0;
var buffer = new byte[32768];
var bufferLen = 1;    
using (var fs = File.OpenRead("filename.txt"))
    while (bufferLen > 0)
    {
        bufferLen = fs.Read(buffer, 0, 32768);
        for (int i = 0; i < bufferLen; i++)
            /* this is only known to work for UTF-8/ASCII other 
               file types may need to search for different End Of Line 
               characters */                
            if (buffer[i] == 10)           
                lines++;
    }
like image 147
Matthew Whited Avatar answered Sep 22 '22 01:09

Matthew Whited


Unless you've got a fixed line length (in terms of bytes) you'll definitely need to read the data. Whether you can avoid converting all the data into text or not will depend on the encoding.

Now the most efficient way will be reinier's - counting line endings manually. However, the simplest code would be to use TextReader.ReadLine(). And in fact, the simplest way of doing that would be to use my LineReader class from MiscUtil, which converts a filename (or various other things) into an IEnumerable<string>. You can then just use LINQ:

int lines = new LineReader(filename).Count();

(If you don't want to grab the whole of MiscUtil, you can get just LineReader on its own from this answer.)

Now that will create a lot of garbage which repeatedly reading into the same char array wouldn't - but it won't read more than one line at a time, so while you'll be stressing the GC a bit, it's not going to blow up with large files. It will also require decoding all the data into text - which you may be able to get away without doing for some encodings.

Personally, that's the code I'd use until I found that it caused a bottleneck - it's a lot simpler to get right than doing it manually. Do you absolutely know that in your current situation, code like the above will be the bottleneck?

As ever, don't micro-optimise until you have to... and you can very easily optimise this at a later date without changing your overall design, so postponing it isn't going to do any harm.

EDIT: To convert Matthew's answer to one which will work for any encoding - but which will incur the penalty of decoding all the data, of course, you might end up with something like the code below. I'm assuming that you only care about \n - rather than \r, \n and \r\n which TextReader normally handles:

public static int CountLines(string file, Encoding encoding)
{
    using (TextReader reader = new StreamReader(file, encoding))
    {
        return CountLines(reader);
    }
}

public static int CountLines(TextReader reader)
{
    char[] buffer = new char[32768];

    int charsRead;
    int count = 0;

    while ((charsRead = reader.Read(buffer, 0, buffer.Length)) > 0)
    {
        for (int i = 0; i < charsRead; i++)
        {
            if (buffer[i] == '\n')
            {
                count++;
            }
        }
    }
    return count;
}
like image 35
Jon Skeet Avatar answered Sep 22 '22 01:09

Jon Skeet