Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to split on successions of newline characters

I'm trying to split a string on newline characters (catering for Windows, OS X, and Unix text file newline characters). If there are any succession of these, I want to split on that too and not include any in the result.

So, for when splitting the following:

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"

The result would be:

['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']

What regex should I use?

like image 899
Humphrey Bogart Avatar asked Apr 08 '10 00:04

Humphrey Bogart


2 Answers

The simplest pattern for this purpose is r'[\r\n]+' which you can pronounce as "one or more carriage-return or newline characters".

like image 146
Alex Martelli Avatar answered Oct 13 '22 01:10

Alex Martelli


If there are no spaces at the starts or ends of the lines, you can use line.split() with no arguments. It will remove doubles. . If not, you can use [a for a a.split("\r\n") if a].

EDIT: the str type also has a method called "splitlines".

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()

like image 37
magcius Avatar answered Oct 13 '22 00:10

magcius