I'm currently getting into a for loop with all the rows I want: <pre class="prettyprint"><code>page = urllib2.urlopen(pageurl) soup = BeautifulSoup(page) tables = soup.find("td", "bodyTd") for row in tables.findAll('tr'): </code></pre> At this point, I have my information, but the <pre class="prettyprint"><code> </code></pre> tags are ruining my output. What's the cleanest way to remove these?

If you want to translate the <code> </code>'s to newlines, do something like this: <pre class="prettyprint"><code>def text_with_newlines(elem): text = '' for e in elem.recursiveChildGenerator(): if isinstance(e, basestring): text += e.strip() elif e.name == 'br': text += '\n' return text </code></pre>

<pre class="prettyprint"><code>for e in soup.findAll('br'): e.extract() </code></pre>

Remove tags from a parsed Beautiful Soup list?

Tags:

I'm currently getting into a for loop with all the rows I want:

page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)
tables = soup.find("td", "bodyTd")
for row in tables.findAll('tr'):

At this point, I have my information, but the

<br />

tags are ruining my output.

What's the cleanest way to remove these?

850

asked May 08 '11 03:05

mamontazeri

2 Answers

If you want to translate the  's to newlines, do something like this:

def text_with_newlines(elem):
    text = ''
    for e in elem.recursiveChildGenerator():
        if isinstance(e, basestring):
            text += e.strip()
        elif e.name == 'br':
            text += '\n'
    return text

answered Oct 19 '22 05:10

Mu Mind

for e in soup.findAll('br'):
    e.extract()

answered Oct 19 '22 06:10

Kabie

Related questions
                            
                                Sending custom Content-Type with ResponseEntity<Resource>
                            
                                How to implement followers/following in Django
                            
                                Get Asp.net/iis to set Cache-control:max-age for static files
                            
                                Mysql join gives duplicate rows
                            
                                uninitialized constant Delayed::Job
                            
                                Make a column nullable in DB2 when Data Capture is enabled
                            
                                Where I can find the most popular Emacs settings?
                            
                                Getting a concrete element from an observableArray
                            
                                How to check if an NSArray contains an object of a particular class?
                            
                                How to download/clone a new branch from my server git repository into an already created local git repository in my PC with Eclipse EGit
                            
                                Print preview ZPL II commands using .NET WinForm before sending it to Zebra printer
                            
                                Add, enable and disable NLog loggers programmatically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove <br> tags from a parsed Beautiful Soup list?