<p>In Python 3 in Windows 7 I read a web page into a string.</p> <p>I then want to split the string into a list at newline characters.</p> <p>I can't enter the newline into my code as the argument in <code>split()</code>, because I get a syntax error</p> <blockquote> <p>'EOL while scanning string literal'</p> </blockquote> <p>If I type in the characters <code>\</code> and <code>n</code>, I get a Unicode error.</p> <p>Is there any way to do it?</p>

<h3>✨ Splitting line in Python:</h3> <p>Have you tried using <code>str.splitlines()</code> method?:</p> <ul> <li>Python <code>2.X</code> documentation here.</li> <li>Python <code>3.X</code> documentation here.</li> </ul> <p>From the docs:</p> <blockquote> <p><code>str.splitlines([keepends])</code></p> <p>Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless <code>keepends</code> is given and true.</p> </blockquote> <p>For example:</p> <pre class="prettyprint"><code>>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines() ['Line 1', '', 'Line 3', 'Line 4'] >>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines(True) ['Line 1\n', '\n', 'Line 3\r', 'Line 4\r\n'] </code></pre> <h3>🤔 Which delimiters are considered?</h3> <blockquote> <p>This method uses the universal newlines approach to splitting lines.</p> </blockquote> <p>The main difference between Python <code>2.X</code> and Python <code>3.X</code> is that the former uses the universal newlines approach to splitting lines, so <code>"\r"</code>, <code>"\n"</code>, and <code>"\r\n"</code> are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:</p> <ul> <li> <code>\v</code> or <code>\x0b</code>: Line Tabulation (added in Python <code>3.2</code>).</li> <li> <code>\f</code> or <code>\x0c</code>: Form Feed (added in Python <code>3.2</code>).</li> <li> <code>\x1c</code>: File Separator.</li> <li> <code>\x1d</code>: Group Separator.</li> <li> <code>\x1e</code>: Record Separator.</li> <li> <code>\x85</code>: Next Line (C1 Control Code).</li> <li> <code>\u2028</code>: Line Separator.</li> <li> <code>\u2029</code>: Paragraph Separator.</li> </ul> <h3>🥊 splitlines VS split:</h3> <blockquote> <p>Unlike <code>str.split()</code> when a delimiter string <em>sep</em> is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:</p> </blockquote> <pre class="prettyprint"><code>>>> ''.splitlines() [] >>> 'Line 1\n'.splitlines() ['Line 1'] </code></pre> <p>While <code>str.split('\n')</code> returns:</p> <pre class="prettyprint"><code>>>> ''.split('\n') [''] >>> 'Line 1\n'.split('\n') ['Line 1', ''] </code></pre> <h3>✂️ Removing additional whitespace:</h3> <p>If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by <code>str.splitlines()</code>, you could use <code>str.splitlines()</code> together with <code>str.strip()</code>:</p> <pre class="prettyprint"><code>>>> [str.strip() for str in 'Line 1 \n \nLine 3 \rLine 4 \r\n'.splitlines()] ['Line 1', '', 'Line 3', 'Line 4'] </code></pre> <h3>🗑️ Removing empty strings (''):</h3> <p>Lastly, if you want to filter out the empty strings from the resulting list, you could use <code>filter()</code>:</p> <pre class="prettyprint"><code>>>> # Python 2.X: >>> filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()) ['Line 1', 'Line 3', 'Line 4'] >>> # Python 3.X: >>> list(filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines())) ['Line 1', 'Line 3', 'Line 4'] </code></pre> <h3>📜 Additional comment regarding the original question:</h3> <p>As the error you posted indicates and Burhan suggested, the problem is from the print. There's a related question about that could be useful to you: UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function</p>

How to split a Python string on new line characters

1 Answers

✨ Splitting line in Python:

Have you tried using str.splitlines() method?:

Python 2.X documentation here.
Python 3.X documentation here.

From the docs:

str.splitlines([keepends])

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

For example:

>>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines() ['Line 1', '', 'Line 3', 'Line 4']  >>> 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines(True) ['Line 1\n', '\n', 'Line 3\r', 'Line 4\r\n']

🤔 Which delimiters are considered?

This method uses the universal newlines approach to splitting lines.

The main difference between Python 2.X and Python 3.X is that the former uses the universal newlines approach to splitting lines, so "\r", "\n", and "\r\n" are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:

\v or \x0b: Line Tabulation (added in Python 3.2).
\f or \x0c: Form Feed (added in Python 3.2).
\x1c: File Separator.
\x1d: Group Separator.
\x1e: Record Separator.
\x85: Next Line (C1 Control Code).
\u2028: Line Separator.
\u2029: Paragraph Separator.

🥊 splitlines VS split:

Unlike str.split() when a delimiter string sep is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line:

>>> ''.splitlines() []  >>> 'Line 1\n'.splitlines() ['Line 1']

While str.split('\n') returns:

>>> ''.split('\n') ['']  >>> 'Line 1\n'.split('\n') ['Line 1', '']

✂️ Removing additional whitespace:

If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by str.splitlines(), you could use str.splitlines() together with str.strip():

>>> [str.strip() for str in 'Line 1  \n  \nLine 3 \rLine 4 \r\n'.splitlines()] ['Line 1', '', 'Line 3', 'Line 4']

🗑️ Removing empty strings (''):

Lastly, if you want to filter out the empty strings from the resulting list, you could use filter():

>>> # Python 2.X: >>> filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines()) ['Line 1', 'Line 3', 'Line 4']  >>> # Python 3.X: >>> list(filter(bool, 'Line 1\n\nLine 3\rLine 4\r\n'.splitlines())) ['Line 1', 'Line 3', 'Line 4']

📜 Additional comment regarding the original question:

As the error you posted indicates and Burhan suggested, the problem is from the print. There's a related question about that could be useful to you: UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function

132

answered Oct 03 '22 23:10

Danziger

Related questions
                            
                                Using abc.ABCMeta in a way it is compatible both with Python 2.7 and Python 3.5
                            
                                python: changes to my copy variable affect the original variable [duplicate]
                            
                                Easiest way to ignore blank lines when reading a file in Python
                            
                                Declaring a multi dimensional dictionary in python
                            
                                How to get rid of specific warning messages in python while keeping all other warnings as normal?
                            
                                numpy vstack vs. column_stack
                            
                                Python loop to run for certain amount of seconds
                            
                                asynchronous programming in python
                            
                                How to close a thread from within?
                            
                                Numpy extract submatrix
                            
                                Trim whitespace using PIL
                            
                                Python - How can I pad a string with spaces from the right and left?
                            
                                How to terminate process from Python using pid?
                            
                                How to filter dictionary keys based on its corresponding values
                            
                                Convert timestamp since epoch to datetime.datetime
                            
                                Save/dump a YAML file with comments in PyYAML
                            
                                How to output to the console and file?
                            
                                Eliminating all data over a given percentile
                            
                                Why truncate when we open a file in 'w' mode in python
                            
                                Platform-independent file paths?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to split a Python string on new line characters

Tags:

python

string

split

user1067305

People also ask