For some large file,
lines_a = len(fa.readlines())
print(lines_a)
And for Bash (on Mac):
wc -l
the result are different!
What is the possible reason?
wc -l
prints the number of newlines in input. In other words, its definition of "line" in "line count" requires the line to end with a newline, and is actually defined by POSIX.
This definition of line can yield surprising behavior if the last line in your file does not end with a newline. Despite such line being displayed in text editors and pagers just fine, wc
will not count it as a line. For example:
$ printf 'foo\nbar\n' | wc -l
2
$ printf 'foo\nbar' | wc -l
1
Python's readlines()
method, on the other hand, is designed to provide the data in the file so that it can be perfectly reconstructed. For that reason, it provides each line with the final newline, and the last non-empty line as-is (with or without the final newline). For the above example, it returns lists ["foo\n", "bar\n"]
and ["foo\n", "bar"]
respectively, both of length two:
$ printf 'foo\nbar' | python -c 'import sys; print len(sys.stdin.readlines())'
2
$ printf 'foo\nbar\n' | python -c 'import sys; print len(sys.stdin.readlines())'
2
Just mention that I met similar problem when I was doing machine translation task. The main reason that the line number is not right, maybe because you have not open the file in 'b' mode. So try to
with open('some file', 'rb') as f:
print(len(f.readlines()))
You will get the same number as wc -l
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With