After looking into my question here, I found that it was caused by a simpler problem.
When I write "\n"
to a file, I expect to read in "\n"
from the file. This is not always the case in Windows.
In [1]: with open("out", "w") as file:
...: file.write("\n")
...:
In [2]: with open("out", "r") as file:
...: s = file.read()
...:
In [3]: s # I expect "\n" and I get it
Out[3]: '\n'
In [4]: with open("out", "rb") as file:
...: b = file.read()
...:
In [5]: b # I expect b"\n"... Uh-oh
Out[5]: b'\r\n'
In [6]: with open("out", "wb") as file:
...: file.write(b"\n")
...:
In [7]: with open("out", "r") as file:
...: s = file.read()
...:
In [8]: s # I expect "\n" and I get it
Out[8]: '\n'
In [9]: with open("out", "rb") as file:
...: b = file.read()
...:
In [10]: b # I expect b"\n" and I get it
Out[10]: b'\n'
In a more organized way:
| Method of Writing | Method of Reading | "\n" Turns Into |
|-------------------|-------------------|-----------------|
| "w" | "r" | "\n" |
| "w" | "rb" | b"\r\n" |
| "wb" | "r" | "\n" |
| "wb" | "rb" | b"\n" |
When I try this on my Linux virtual machine, it always returns \n. How can I do this in Windows?
Edit:
This is especially problematic with the pandas library, which appears to write DataFrame
s to csv
with "w"
and read csv
s with "rb"
. See the question linked at the top for an example of this.
In Python strings, the backslash "\" is a special character, also called the "escape" character. It is used in representing certain whitespace characters: "\t" is a tab, "\n" is a newline, and "\r" is a carriage return.
In Python, the new line character “\n” is used to create a new line. When inserted in a string all the characters after the character are added to a new line. Essentially the occurrence of the “\n” indicates that the line ends here and the remaining characters would be displayed in a new line.
The new line character in Python is \n . It is used to indicate the end of a line of text. You can print strings without adding a new line with end = <character> , which <character> is the character that will be used to separate the lines.
Since you are using Python 3, you're in luck. When you open the file for writing, just specify newline='\n'
to ensure that it writes '\n'
instead of the system default, which is \r\n
on Windows. From the docs:
When writing output to the stream, if
newline
isNone
, any'\n'
characters written are translated to the system default line separator,os.linesep
. If newline is''
or'\n'
, no translation takes place. Ifnewline
is any of the other legal values, any'\n'
characters written are translated to the given string.
The reason that you think that you are "sometimes" seeing the two-character output is that when you open the file in binary mode, no conversion is done at all. Byte arrays are just displayed in ASCII whenever possible for your convenience. Don't think of them as real strings until they have been decoded. The binary output you show is the true contents of the file in all your examples.
When you open the file for reading in the default text mode, the newline
parameter will work similarly to how it does for writing. By default all \r\n
in the file will be converted to just \n
after the characters are decoded. This is very nice when your code travels between OSes but your files do not since you can use the exact same code that relies only on \n
. If your files travel too, you should stick to the relatively portable newline='\n'
for at least the output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With