Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior of universal newline mode with StringIO and csv modules

Consider the following (Python 3.2 under Windows):

>>> import io
>>> import csv
>>> output = io.StringIO()         # default parameter newline=None
>>> csvdata = [1, 'a', 'Whoa!\nNewlines!']
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
25
>>> output.getvalue()
'1,"a","Whoa!\nNewlines!"\r\n'

Why is there a single \n - shouldn't it have been converted to \r\n since universal newlines mode is enabled?

With this enabled, on input, the lines endings \n, \r, or \r\n are translated to \n before being returned to the caller. Conversely, on output, \n is translated to the system default line separator, os.linesep.

like image 248
Tim Pietzcker Avatar asked Feb 06 '12 08:02

Tim Pietzcker


2 Answers

The "single" \n occurs as a data character inside the third field. Consequently that field is quoted so that a csv reader will treat it as part of the data. It is NOT a "line terminator" (should be called a row separator) or part thereof. To get a better appreciation of the quoting, remove the quoting=csv.QUOTE_NONNUMERIC.

The \r\n is produced because csv terminates rows with the dialect.lineterminator whose default is \r\n. In other words, the "universal newlines" setting is ignored.

Update

The 2.7 and 3.2 docs for io.StringIO are virtually identical as far as the newline arg is concerned.

The newline argument works like that of TextIOWrapper. The default is to do no newline translation.

We'll examine the first sentence below. The second sentence is true for output, depending on your interpretation of "default" and "newline translation".

TextIOWrapper docs:

newline can be None, '', '\n', '\r', or '\r\n'. It controls the handling of line endings. If it is None, universal newlines is enabled. With this enabled, on input, the lines endings '\n', '\r', or '\r\n' are translated to '\n' before being returned to the caller. Conversely, on output, '\n' is translated to the system default line separator, os.linesep. If newline is any other of its legal values, that newline becomes the newline when the file is read and it is returned untranslated. On output, '\n' is converted to the newline.

Python 3.2 on Windows:

>>> from io import StringIO as S
>>> import os
>>> print(repr(os.linesep))
'\r\n'
>>> ss = [S()] + [S(newline=nl) for nl in (None, '', '\n', '\r', '\r\n')]
>>> for x, s in enumerate(ss):
...     m = s.write('foo\nbar\rzot\r\n')
...     v = s.getvalue()
...     print(x, m, len(v), repr(v))
...
0 13 13 'foo\nbar\rzot\r\n'
1 13 12 'foo\nbar\nzot\n'
2 13 13 'foo\nbar\rzot\r\n'
3 13 13 'foo\nbar\rzot\r\n'
4 13 13 'foo\rbar\rzot\r\r'
5 13 15 'foo\r\nbar\rzot\r\r\n'
>>>

Line 0 shows that the "default" that you get with no newline arg involves no translation of \n (or any other character). It is certainly NOT converting '\n' to os.linesep

Line 1 shows that what you get with newline=None (should be the same as line 0, shouldn't it??) is in effect INPUT universal newlines translation -- bizarre!

Line 2: newline='' does no change, like line 0. It is certainly NOT converting '\n' to ''.

Lines 3, 4, and 5: as the docs say, '\n' is converted to the value of the newline arg.

The equivalent Python 2.X code produces equivalent results with Python 2.7.2.

Update 2 For consistency with built-in open(), the default should be os.linesep, as documented. To get the no-translation-on-output behaviour, use newline=''. Note: the open() docs are much clearer. I'll submit a bug report tomorrow.

like image 142
John Machin Avatar answered Oct 01 '22 05:10

John Machin


From the docs for StringIO:

The newline argument works like that of TextIOWrapper. The default is to do no newline translation.

So StringIO is not doing any newline translation normally. That default makes sense - StringIO isn't writing to disk, so it doesn't need to translate to the platform-specific newlines.

As John pointed out, the csv module does its own universal newlines, but only for row endings, not for newlines within strings.

like image 23
Thomas K Avatar answered Oct 01 '22 04:10

Thomas K