A recent question about splitting a binary file using null characters made me think of a similar text-oriented question.
Given the following file:
Parse me using spaces, please.
Using Raku, I can parse this file using space (or any chosen character) as the input newline character, thus:
my $fh = open('spaced.txt', nl-in => ' ');
while $fh.get -> $line {
put $line;
}
Or more concisely:
.put for 'spaced.txt'.IO.lines(nl-in => ' ');
Either of which gives the following result:
Parse me using spaces, please.
Is there something equivalent in Python 3?
The closest I could find required reading an entire file into memory:
for line in f.read().split('\0'):
print line
Update: I found several other older questions and answers that seemed to indicate that this isn't available, but I figured there may have been new developments in this area in the last several years:
Python restrict newline characters for readlines()
Change newline character .readline() seeks
In Python, the new line character “\n” is used to create a new line. When inserted in a string all the characters after the character are added to a new line. Essentially the occurrence of the “\n” indicates that the line ends here and the remaining characters would be displayed in a new line.
The new line character in Python is \n . It is used to indicate the end of a line of text. You can print strings without adding a new line with end = <character> , which <character> is the character that will be used to separate the lines.
readline() returns the next line of the file which contains a newline character in the end. Also, if the end of the file is reached, it will return an empty string. Example: Python3.
The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end.
There is no builtin support to read a file splitted by a custom character.
However loading a file with the "U"-flag allows universal newline-character, which can be obtained by file.newlines. It keeps the newline-mode in the whole file.
Here is my generator to read a file, while not everything in memory:
def customReadlines(fileNextBuff, char):
"""
\param fileNextBuff a function returning the next buffer or "" on EOF
\param char a string with the lines are splitted, the char is not included in the yielded elements
"""
lastLine = ""
lenChar = len(char)
while True:
thisLine = fileNextBuff
if not thisLine: break #EOF
fnd = thisLine.find(char)
while fnd != -1:
yield lastLine + thisLine[:fnd]
lastLine = ""
thisLine = thisLine[fnd+lenChar:]
fnd = thisLine.find(char)
lastLine+= thisLine
yield lastLine
### EXAMPLES ###
#open file.txt and print each part of the file ending with Null-terminator by loading a buffer of 256 characters
with open("file.bin", "r") as f:
for l in customReadlines((lambda: f.read(0x100)), "\0"):
print(l)
# open the file errors.log and split the file with a special string, while it loads a whole line at a time
with open("errors.log", "r") as f:
for l in customReadlines(f.readline, "ERROR:")
print(l)
print(" " + '-' * 78) # some seperator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With