This code almost does what I need it to.. <pre class="prettyprint"><code>for line in all_lines: s = line.split('>') </code></pre> Except it removes all the '>' delimiters. So, <pre class="prettyprint"><code><html><head> </code></pre> Turns into <pre class="prettyprint"><code>['<html','<head'] </code></pre> Is there a way to use the split() method but keep the delimiter, instead of removing it? With these results.. <pre class="prettyprint"><code>['<html>','<head>'] </code></pre>

<pre class="prettyprint"><code>d = ">" for line in all_lines: s = [e+d for e in line.split(d) if e] </code></pre>

If you are parsing HTML with splits, you are most likely doing it wrong, except if you are writing a one-shot script aimed at a fixed and secure content file. If it is supposed to work on any HTML input, how will you handle something like <code><a title='growth > 8%' href='#something'></code>? Anyway, the following works for me: <pre class="prettyprint"><code>>>> import re >>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2] ['<body>', '<table>', '<tr>', '<td>'] </code></pre>

How about this: <pre class="prettyprint"><code>import re s = '<html><head>' re.findall('[^>]+>', s) </code></pre>

Python split() without removing the delimiter [duplicate]

Tags:

python

split

delimiter

This code almost does what I need it to..

for line in all_lines:
    s = line.split('>')

Except it removes all the '>' delimiters.

So,

<html><head>

Turns into

['<html','<head']

Is there a way to use the split() method but keep the delimiter, instead of removing it?

With these results..

['<html>','<head>']

224

asked Oct 23 '11 12:10

some1

4 Answers

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e]

126

answered Oct 17 '22 15:10

P.Melch

If you are parsing HTML with splits, you are most likely doing it wrong, except if you are writing a one-shot script aimed at a fixed and secure content file. If it is supposed to work on any HTML input, how will you handle something like <a title='growth > 8%' href='#something'>?

Anyway, the following works for me:

>>> import re
>>> re.split('(<[^>]*>)', '<body><table><tr><td>')[1::2]
['<body>', '<table>', '<tr>', '<td>']

answered Oct 17 '22 14:10

gb.

How about this:

import re
s = '<html><head>'
re.findall('[^>]+>', s)

answered Oct 17 '22 15:10

Óscar López

Just split it, then for each element in the array/list (apart from the last one) add a trailing ">" to it.

answered Oct 17 '22 14:10

orangething

Related questions
                            
                                Comparing two dataframes and getting the differences [duplicate]
                            
                                Replacing few values in a pandas dataframe column with another value
                            
                                OperationalError: database is locked
                            
                                Python memoising/deferred lookup property decorator
                            
                                How can I strip the whitespace from Pandas DataFrame headers?
                            
                                How can I find the union of two Django querysets?
                            
                                Switch between two frames in tkinter
                            
                                Python check if isinstance any type in list?
                            
                                python BeautifulSoup parsing table
                            
                                How to lowercase a pandas dataframe string column if it has missing values?
                            
                                Cancel an already executing task with Celery?
                            
                                Destructuring-bind dictionary contents
                            
                                Convert pandas Series to DataFrame
                            
                                DateTimeField doesn't show in admin system
                            
                                Abstract attributes in Python [duplicate]
                            
                                Best way to find the months between two dates
                            
                                How can I strip first and last double quotes?
                            
                                Android Python Programming [closed]
                            
                                Why #egg=foo when pip-installing from git repo
                            
                                What is this odd colon behavior doing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With