Using Beautiful Soup, how do I iterate over all embedded text?

Let's say I wanted to remove vowels from HTML:

<a href="foo">Hello there!</a>Hi!

becomes

<a href="foo">Hll thr!</a>H!

I figure this is a job for Beautiful Soup. How can I select the text in between tags and operate on it like this?

What is the difference between Find_all () and find () in beautiful soup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.

What is a supported parser for Beautiful Soup?

Beautiful Soup then parses the document using the best available parser. It will use an HTML parser unless you specifically tell it to use an XML parser.

Suppose the variable test_html has the following html content:

<html>
<head><title>Test title</title></head>
<body>
<p>Some paragraph</p>
Useless Text
<a href="http://stackoverflow.com">Some link</a>not a link
<a href="http://python.org">Another link</a>
</body></html>

Just do this:

from BeautifulSoup import BeautifulSoup

test_html = load_html_from_above()
soup = BeautifulSoup(test_html)

for t in soup.findAll(text=True):
    text = unicode(t)
    for vowel in u'aeiou':
        text = text.replace(vowel, u'') 
    t.replaceWith(text)

print soup

That prints:

<html>
<head><title>Tst ttl</title></head>
<body>
<p>Sm prgrph</p>
Uslss Txt
<a href="http://stackoverflow.com">Sm lnk</a>nt  lnk
<a href="http://python.org">Anthr lnk</a>
</body></html>

Note that the tags and attributes are untouched.

Using Beautiful Soup, how do I iterate over all embedded text?

Tags:

python

beautifulsoup

mike

People also ask

1 Answers

nosklo

Recent Activity

Donate For Us

Using Beautiful Soup, how do I iterate over all embedded text?

Tags:

python

beautifulsoup

mike

People also ask

1 Answers

nosklo

Related questions

Recent Activity

Donate For Us