In Python 3 in Windows 7 I read a web page into a string.
I then want to split the string into a list at newline characters.
I can't enter the newline into my code as the argument in split()
, because I get a syntax error
'EOL while scanning string literal'
If I type in the characters \
and n
, I get a Unicode error.
Is there any way to do it?
Python String splitlines() Method Python splitlines() method splits the string based on the lines. It breaks the string at line boundaries and returns a list of splitted strings. Line breakers can be a new line (\n), carriage return (\r) etc.
Split String by Newline in Java 8. Java 8 provides an “\R” pattern that matches any Unicode line-break sequence and covers all the newline characters for different operating systems. Therefore, we can use the “\R” pattern instead of “\\r?\\ n|\\r” in Java 8 or higher.
You can use String. Split() method with params char[] ; Returns a string array that contains the substrings in this instance that are delimited by elements of a specified Unicode character array.
Have you tried using str.splitlines()
method?:
2.X
documentation here.3.X
documentation here.From the docs:
str.splitlines([keepends])
Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless
keepends
is given and true.
For example:
>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines() ['Line 1', '', 'Line 3', 'Line 4'] >>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines(True) ['Line 1\n', '\n', 'Line 3\r', 'Line 4\r\n']
This method uses the universal newlines approach to splitting lines.
The main difference between Python 2.X
and Python 3.X
is that the former uses the universal newlines approach to splitting lines, so "\r"
, "\n"
, and "\r\n"
are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:
\v
or \x0b
: Line Tabulation (added in Python 3.2
).\f
or \x0c
: Form Feed (added in Python 3.2
).\x1c
: File Separator.\x1d
: Group Separator.\x1e
: Record Separator.\x85
: Next Line (C1 Control Code).\u2028
: Line Separator.\u2029
: Paragraph Separator.Unlike
str.split()
when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:
>>> ''.splitlines() [] >>> 'Line 1\n'.splitlines() ['Line 1']
While str.split('\n')
returns:
>>> ''.split('\n') [''] >>> 'Line 1\n'.split('\n') ['Line 1', '']
If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by str.splitlines()
, you could use str.splitlines()
together with str.strip()
:
>>> [str.strip() for str in 'Line 1 \n \nLine 3 \rLine 4 \r\n'.splitlines()] ['Line 1', '', 'Line 3', 'Line 4']
Lastly, if you want to filter out the empty strings from the resulting list, you could use filter()
:
>>> # Python 2.X: >>> filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()) ['Line 1', 'Line 3', 'Line 4'] >>> # Python 3.X: >>> list(filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines())) ['Line 1', 'Line 3', 'Line 4']
As the error you posted indicates and Burhan suggested, the problem is from the print. There's a related question about that could be useful to you: UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With