I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?
At the moment I do:
def file_len(fname): with open(fname) as f: for i, l in enumerate(f): pass return i + 1
is it possible to do any better?
To count all the lines of code in the files in a directory, call the "countIn" function, passing the directory as a parameter.
The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.
Method 1: Read a File Line by Line using readlines() This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines. We can iterate over the list and strip the newline '\n' character using strip() function. Example: Python3.
One line, probably pretty fast:
num_lines = sum(1 for line in open('myfile.txt'))
You can't get any better than that.
After all, any solution will have to read the entire file, figure out how many \n
you have, and return that result.
Do you have a better way of doing that without reading the entire file? Not sure... The best solution will always be I/O-bound, best you can do is make sure you don't use unnecessary memory, but it looks like you have that covered.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With