Consider the following (Python 3.2 under Windows):
>>> import io
>>> import csv
>>> output = io.StringIO() # default parameter newline=None
>>> csvdata = [1, 'a', 'Whoa!\nNewlines!']
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
25
>>> output.getvalue()
'1,"a","Whoa!\nNewlines!"\r\n'
Why is there a single \n
- shouldn't it have been converted to \r\n
since universal newlines mode is enabled?
With this enabled, on input, the lines endings
\n
,\r
, or\r\n
are translated to\n
before being returned to the caller. Conversely, on output,\n
is translated to the system default line separator,os.linesep
.
The "single" \n
occurs as a data character inside the third field. Consequently that field is quoted so that a csv reader will treat it as part of the data. It is NOT a "line terminator" (should be called a row separator) or part thereof. To get a better appreciation of the quoting, remove the quoting=csv.QUOTE_NONNUMERIC
.
The \r\n
is produced because csv terminates rows with the dialect.lineterminator
whose default is \r\n
. In other words, the "universal newlines" setting is ignored.
Update
The 2.7 and 3.2 docs for io.StringIO
are virtually identical as far as the newline arg is concerned.
The newline argument works like that of TextIOWrapper. The default is to do no newline translation.
We'll examine the first sentence below. The second sentence is true for output, depending on your interpretation of "default" and "newline translation".
TextIOWrapper docs:
newline can be None, '', '\n', '\r', or '\r\n'. It controls the handling of line endings. If it is None, universal newlines is enabled. With this enabled, on input, the lines endings '\n', '\r', or '\r\n' are translated to '\n' before being returned to the caller. Conversely, on output, '\n' is translated to the system default line separator, os.linesep. If newline is any other of its legal values, that newline becomes the newline when the file is read and it is returned untranslated. On output, '\n' is converted to the newline.
Python 3.2 on Windows:
>>> from io import StringIO as S
>>> import os
>>> print(repr(os.linesep))
'\r\n'
>>> ss = [S()] + [S(newline=nl) for nl in (None, '', '\n', '\r', '\r\n')]
>>> for x, s in enumerate(ss):
... m = s.write('foo\nbar\rzot\r\n')
... v = s.getvalue()
... print(x, m, len(v), repr(v))
...
0 13 13 'foo\nbar\rzot\r\n'
1 13 12 'foo\nbar\nzot\n'
2 13 13 'foo\nbar\rzot\r\n'
3 13 13 'foo\nbar\rzot\r\n'
4 13 13 'foo\rbar\rzot\r\r'
5 13 15 'foo\r\nbar\rzot\r\r\n'
>>>
Line 0 shows that the "default" that you get with no newline
arg involves no translation of \n
(or any other character). It is certainly NOT converting '\n'
to os.linesep
Line 1 shows that what you get with newline=None
(should be the same as line 0, shouldn't it??) is in effect INPUT universal newlines translation -- bizarre!
Line 2: newline=''
does no change, like line 0. It is certainly NOT converting '\n'
to ''
.
Lines 3, 4, and 5: as the docs say, '\n'
is converted to the value of the newline
arg.
The equivalent Python 2.X code produces equivalent results with Python 2.7.2.
Update 2 For consistency with built-in open()
, the default should be os.linesep
, as documented. To get the no-translation-on-output behaviour, use newline=''
. Note: the open()
docs are much clearer. I'll submit a bug report tomorrow.
From the docs for StringIO:
The newline argument works like that of TextIOWrapper. The default is to do no newline translation.
So StringIO is not doing any newline translation normally. That default makes sense - StringIO isn't writing to disk, so it doesn't need to translate to the platform-specific newlines.
As John pointed out, the csv module does its own universal newlines, but only for row endings, not for newlines within strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With