Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the argument newline='' do in the open function?

Tags:

python

I was learning Python in Codecademy and they were talking about using the open() function for CSV files. I couldn't really understand what the argument newline='' meant for the code.

import csv

with open('addresses.csv', newline='') as addresses_csv:
  address_reader = csv.DictReader(addresses_csv, delimiter=';')
  for row in address_reader:
    print(row['Address'])
like image 202
Hmmm Avatar asked Nov 16 '25 03:11

Hmmm


1 Answers

I see the question has been answered already, but here's a good summary for the usage of the newline argument.

According to the official csv documentation, it is recommended to open the file with the newline='' argument value on all platforms to disable universal newline translation. Universal newline is a manner of interpreting text streams in which all of the following are recognized as ending a line: the Unix convention, '\n', the Windows convention, '\r\n', and the old Macintosh convention, '\r'.

The newline argument in the open() function controls how newlines translations work. This argument can take any of the following values: None, '', '\n', '\r', '\r\n'. According to the official python documentation, it works as follows:

When reading the input from the stream (that is, when the mode of
opening from the open() function is 'r'), if newline is None
(default value), universal newlines mode is enabled. That is, any
of the following strings in the input are interpreted as newlines
and translated to '\n' before the open() function returns. If
it is '', universal newline mode is enabled still, but line
endings are returned untranslated when the caller returns. If it has any of the other values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

When writing output to the stream, if newline is None, any '\n'
characters written are translated to the system default line
separator (i.e. os.linesep, which is a string value of the following: '\r', '\n', '\r\n', according to the os used). If newline is
set to '' or '\n', no translation takes place. If newline is any
of the other values, any '\n' characters written are translated to
the given value. This mentions only the translation of '\n'. If we
have '\r' in the output to the stream, it stays untranslated.

The reason why the csv documentation recommends opening a file with newline='' is because the csv writer terminates each line with a \r\n as mentioned in the official documentation. Now, on Windows, when writing to the stream using the open() function, if the value is None, at the ending of a line, the \r stays as it is while \n gets translated to \r\n according to the writing output to the stream section above. This means that a text like:

Line one
Line two

is in the form: 'Line one\r\r\nLine two\r\r\n'. The issue with this is that when reading the file again, assuming no argument is given to newline, it defaults to None and translates both \r and \r\n as new lines. The reason why opening a csv file with this form shows double lines is probably because the program you are opening it from (excel, notepad, etc.) has newline translation handling similar to that of the open() function and takes both \r and \r\n as newlines.

You can try and check yourself by creating csv without specifying the newline argument so that newlines are printed as \r\r\n:

with open('path\to\csv_file.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(['row', 'one'])
    writer.writerow(['row', 'two'])

then reading the file without newline argument:

with open("path\to\csv_file.csv") as f:
    reader = csv.reader(f) 
    for row in reader: 
        print(row)

Output:

['row', 'one']
[]
['row', 'two']
[]

While with newline argument as '\r\n' on reading, you get the output:

['row', 'one']
['row', 'two']

In the first case, first Python reads the first line till before \r and creates a list from the data that was before this character and then creates a new line. But then before any other data, it encounters another newline character: \r\n, so it creates another list, but this list is empty since there was no data in between the two newline characters and the algorithm continues. On the second case, it reads \r\n only as an indication for a newline.

like image 86
Bavely Avatar answered Nov 18 '25 17:11

Bavely