Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make the readline method of Python recognize both end-of-line variations?

Tags:

python

I am writing a Python file that needs to read in several files of different types. I am reading the files in line by line with the traditional for line in f after using f = open("file.txt", "r").

This doesn't seem to be working for all files. My guess is some files end with different encodings (such as \r\n versus just \r). I can read the whole file in and do a string split on \r, but that is hugely costly and I'd rather not. Is there a way to make the readline method of Python recognize both end-of-line variations?

like image 216
socks_swerve Avatar asked Nov 11 '10 19:11

socks_swerve


People also ask

Does the readline () method read any line ending characters?

Characteristic of Python readline() Python readline() method reads only one complete line from the file given. It appends a newline (“\n”) at the end of the line.

What is the method readline () used for in Python?

The readline() method returns one line from the file. You can also specified how many bytes from the line to return, by using the size parameter.

How does readline () know where each line is?

Readline uses operating system calls under the hood. The file object corresponds to a file descriptor in the OS, and it has a pointer that keeps track of where in the file we are at the moment. The next read will return the next chunk of data from the file from that point on.

What does readline return at end of file Python?

In addition to the for loop, Python provides three methods to read data from the input file. The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end.


1 Answers

Use the universal newline support -- see http://docs.python.org/library/functions.html#open

In addition to the standard fopen() values mode may be 'U' or 'rU'. Python is usually built with universal newline support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program. If Python is built without universal newline support a mode with 'U' is the same as normal text mode. Note that file objects so opened also have an attribute called newlines which has a value of None (if no newlines have yet been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types seen.

like image 142
bgporter Avatar answered Oct 02 '22 11:10

bgporter