Why does the lines count differently using two different way. to load text?

Name: Compare Two Excel Sheets with Different Number of Rows and Find Differences using Python
Uploaded: 2022-09-12 08:01:32
Description: Why does the lines count differently using two different way. to load text?import pathlib file_path = 'vocab.txt' vocab = pathlib.Path(file_path).read_text().splitlines()

Question

import pathlib

file_path = 'vocab.txt'
vocab = pathlib.Path(file_path).read_text().splitlines()
print(len(vocab))

count = 0
with open(file_path, 'r', encoding='utf8') as f:
  for line in f:
    count += 1

print(count)

The two counts are 2122 and 2120. Shouldn't they be same?

juanpa.arrivillaga · Accepted Answer

So, looking at the documentation for str.splitlines, we see that the line delimiters for this method are a superset of "universal newlines":

This method splits on the following line boundaries. In particular, the boundaries are a superset of universal newlines.

Representation	Description
	Line Feed
	Carriage Return
	Carriage Return + Line Feed
`\v` or `\x0b`	Line Tabulation
`\f` or `\x0c`	Form Feed
`\x1c`	File Separator
`\x1d`	Group Separator
`\x1e`	Record Separator
`\x85`	Next Line (C1 Control Code)
`\u2028`	Line Separator
`\u2029`	Paragraph Separator

A a line for a text-file will by default use the universal-newlines approach to interpret delimiters, from the docs:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in ' ', ' ', or ' ', and these are translated into ' ' before being returned to the caller. If newline is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If newline has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

Why does the lines count differently using two different way. to load text?

Tags:

python

marlon

Video Answer

1 Answers

juanpa.arrivillaga

Recent Activity

Donate For Us

Representation	Description
`\n`	Line Feed
`\r`	Carriage Return
`\r\n`	Carriage Return + Line Feed
`\v` or `\x0b`	Line Tabulation
`\f` or `\x0c`	Form Feed
`\x1c`	File Separator
`\x1d`	Group Separator
`\x1e`	Record Separator
`\x85`	Next Line (C1 Control Code)
`\u2028`	Line Separator
`\u2029`	Paragraph Separator

Why does the lines count differently using two different way. to load text?

Tags:

python

marlon

Video Answer

1 Answers

juanpa.arrivillaga

Related questions

Recent Activity

Donate For Us