I have a text file that contains both <code>\n</code> and <code>\r\n</code> end-of-line markers. I want to split only on <code>\r\n</code>, but can't figure out a way to do this with python's readlines method. Is there a simple workaround for this?

As @eskaev mentions, you'll usually want to avoid reading the complete file into memory if not necessary. <code>io.open()</code> allows you to specify a <code>newline</code> keyword argument, so you can still iterate over lines and have them split only at the specified newlines: <pre class="prettyprint"><code>import io for line in io.open('in.txt', newline='\r\n'): print repr(line) </code></pre> Output: <pre class="prettyprint"><code>u'this\nis\nsome\r\n' u'text\nwith\nnewlines.' </code></pre>

Avoid reading it in text mode. Python reads texts files with universal newline support. This means that all line endings are interpreted as <code>\n</code>: <pre class="prettyprint"><code>>>> with open('out', 'wb') as f: ... f.write(b'a\nb\r\nc\r\nd\ne\r\nf') ... 14 >>> with open('out', 'r') as f: f.readlines() ... ['a\n', 'b\n', 'c\n', 'd\n', 'e\n', 'f'] </code></pre> Note that using <code>U</code> doesn't change the result1: <pre class="prettyprint"><code>>>> with open('out', 'rU') as f: f.readlines() ... ['a\n', 'b\n', 'c\n', 'd\n', 'e\n', 'f'] </code></pre> However you can always read the file in binary mode, decode it, and then split on <code>\r\n</code>: <pre class="prettyprint"><code>>>> with open('out', 'rb') as f: f.read().split(b'\r\n') ... [b'a\nb', b'c', b'd\ne', b'f'] </code></pre> (example in python3. You can <code>decode</code> the bytes into unicode either before or after the <code>split</code>). you can avoid reading the whole file into memory and read it in blocks instead. However it becomes a bit mroe complex to correctly handle the lines (you have to manually check where the last line started and concatenate it to the following block). <hr> 1 I believe it's because universal newline is enabled by default in all normal installations. You have to explicitly disable it when configuring the installation and then the <code>r</code> and <code>rU</code> mode would have different behaviours (the first would only split lines on the OS native line endings, the latter would produce the result shown above).

How to split only on carriage returns with readlines in python?

I have a text file that contains both \n and \r\n end-of-line markers. I want to split only on \r\n, but can't figure out a way to do this with python's readlines method. Is there a simple workaround for this?

How do you split a specific line in a file in Python?

The fastest way to split text in Python is with the split() method. This is a built-in method that is useful for separating a string into its individual parts. The split() method will return a list of the elements in a string.

Does Readlines strip newline?

Does Python Readlines include newline? In addition to the for loop, Python provides three methods to read data from the input file. The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end.

As @eskaev mentions, you'll usually want to avoid reading the complete file into memory if not necessary.

io.open() allows you to specify a newline keyword argument, so you can still iterate over lines and have them split only at the specified newlines:

import io

for line in io.open('in.txt', newline='\r\n'):
    print repr(line)

Output:

u'this\nis\nsome\r\n'
u'text\nwith\nnewlines.'

Avoid reading it in text mode. Python reads texts files with universal newline support. This means that all line endings are interpreted as \n:

>>> with open('out', 'wb') as f:
...     f.write(b'a\nb\r\nc\r\nd\ne\r\nf')
... 
14
>>> with open('out', 'r') as f: f.readlines()
... 
['a\n', 'b\n', 'c\n', 'd\n', 'e\n', 'f']

Note that using U doesn't change the result¹:

>>> with open('out', 'rU') as f: f.readlines()
... 
['a\n', 'b\n', 'c\n', 'd\n', 'e\n', 'f']

However you can always read the file in binary mode, decode it, and then split on \r\n:

>>> with open('out', 'rb') as f: f.read().split(b'\r\n')
... 
[b'a\nb', b'c', b'd\ne', b'f']

(example in python3. You can decode the bytes into unicode either before or after the split).

you can avoid reading the whole file into memory and read it in blocks instead. However it becomes a bit mroe complex to correctly handle the lines (you have to manually check where the last line started and concatenate it to the following block).

¹ I believe it's because universal newline is enabled by default in all normal installations. You have to explicitly disable it when configuring the installation and then the r and rU mode would have different behaviours (the first would only split lines on the OS native line endings, the latter would produce the result shown above).

How to split only on carriage returns with readlines in python?

Tags:

python

newline

carriage-return

user3784050

People also ask

2 Answers

Lukas Graf

Bakuriu

Recent Activity

Donate For Us

How to split only on carriage returns with readlines in python?

Tags:

python

newline

carriage-return

user3784050

People also ask

2 Answers

Lukas Graf

Bakuriu

Related questions

Recent Activity

Donate For Us